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Preface 



Data management systems for large applications seek infrastructural support 
from database systems and distributed computing systems. These systems im- 
prove with the changing requirements of applications. New research efforts are 
being made to support electronic commerce (EC) applications, and voluminous 
data handling applications. At the same time, support techniques are being stud- 
ied in data mining and knowledge discovery, object-relational database manage- 
ment systems, and mobile computing systems. The international workshop on 
Databases in Networked Information Systems (DNIS) 2000 has been organized 
to emphasize the research activity on support systems. 

DNIS 2000 is being held on 4-6 December 2000 at the University of Aizu in 
Japan. The workshop program includes research contributions and invited con- 
tributions. A view of research activity in evolving data management systems and 
related research issues has been provided by the session on this topic. The invited 
contributions have been contributed by Dr. Qiming Chen, Dr. Umeshwar Dayal, 
and Dr. Meichun Hsu, and by Masaru Kitsuregawa, Takahiko Shintani, Takeshi 
Yoshizawa, and Iko Pramudiono. The workshop session on database systems in- 
cludes contributed papers by Dr. Jayant R. Haritsa and by Professor Divyakant 
Agrawal and Dr. Amr El Abbadi. The session on networked information systems 
includes the invited contribution by Professor Krithi Ramamritham, Pavan De- 
olasee, Amol Katkar, Ankur Panchbudhe, and Prashant Shenoy. 

I would like to thank the members of the program committee for their sup- 
port and all authors who considered DNIS 2000 in making research contributions. 

The sponsoring organizations and the organizing committee deserve praise 
for the support they provided. A number of individuals have contributed to the 
success of the workshop. I thank Professor P.C.P. Bhatt, Professor J. Biskup, 
Dr. Carl W. Vilbrand, and Professor M. Kitsuregawa, for providing continuous 
support and encouragement. 

I have received invaluable support from the University of Aizu. Professor 
Shoichi Noguchi, President, provided encouragement and took interest in the 
formation of the plan. I thank Professor Shunji Mori, Head of Department of 
Computer Software, for making the support available. I express my gratitude to 
the members and chairman of the International Affairs Committee for support- 
ing the workshop proposal. I also thank all my colleagues at the university for 
their cooperation and support. 



September 2000 



Subhash Bhalla 
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OLAP-Based Data Mining for Business Intelligence 
Applications in Telecommunications and E-commerce 

Qiming Chen, Umeshwar Dayal, and Meichun Hsu 

Hewlett-Packard Laboratories 
1501 Page Mill Road, Palo Alto, CA 94304, USA 
{qchen, dayal , mhsu}@hpl . hp . com 



Abstract. Business intelligence applications require the analysis and mining of 
large volumes of transaction data to support business managers in making in- 
formed decisions. In a telecommunication network, hundreds of millions of call 
detail records are generated daily. Business intelligence applications such as 
fraud detection and churn analysis require the collection and mining of these 
records on a continuous basis. Similarly, electronic commerce applications re- 
quire the analysis of millions of shopping transaction records daily to guide 
personalized marketing, promotional campaigns, and fraud detection. An im- 
portant component of many of these applications is customer profiling, which 
aims to extract patterns of behavior from a collection of transaction records, 
and the comparison of such patterns. The high data volumes and data flow rates 
pose serious scalability and performance challenges. We show how a scalable 
data-warehouse/OLAP framework for customer profiling and pattern compari- 
son can meet these performance requirements. Also, since it is important in 
many business intelligence applications to collect and analyze transaction rec- 
ords continuously, rather than in batches, we show how to automate the whole 
operation chain, including data capture, filtering, loading, and incremental 
summarization and analysis. 



1 Introduction 

Business Intelligence is the gathering, management, and analysis of large amounts of 
data on a company’s customers, products, services, operations, suppliers, and partners 
and all the transactions in between. Examples of business intelligence applications 
include traffic analysis, fraud detection, and customer loyalty (churn) analysis in the 
telecommunications industry, which require analyzing large volumes of call detail 
records [1,2]; and target marketing, market basket analysis, customer profiling, and 
fraud detection in the e-commerce industry, which require analyzing large volumes of 
shopping transaction data from electronic storefront sites [3]. Typically, business intelli- 
gence applications involve extracting data from operational databases, transforming the data, 
and loading it into data warehouses. The data in the warehouse is then input to a variety of 
reporting, querying, on-line analytical processing (OLAP) and data mining tools, which gener- 
ate patterns, rule, or models used by business managers for making decisions. These decisions 
are then fed back in the form of business actions such as product recommendations or fraud 
alerts. Figure 1 shows a typical architecture for business intelligence. 

S. Bhalla (Ed.): DNIS 2000, LNCS 1966, pp. 1-19, 2000. 

© Springer-Verlag Berlin Heidelberg 2000 
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Reporting and Mining Components 




Fig. 1. Data Warehousing and OLAP-based Architecture for Business Intelligence 
Applications 

An important component of business intelligence applications is customer profiling, 
which aims at extracting typical or abnormal patterns of behavior. For telecommuni- 
cation applications, for instance, a customer’s calling behavior is represented by the 
composition and periodic appearance of the callees (the persons called); time- 
windows (when calls are made); and duration (how long calls last). For e-commerce 
applications, a customer’s shopping behavior could be represented by ads viewed, 
products selected, products bought, time-windows, price, etc. The techniques for 
customer profiling and comparison are very similar. In the rest of this paper, we will 
illustrate the techniques using telecom, examples. (The reader is referred to [12] for e- 
commerce examples.) 

Calling behavior profiling has become increasingly important in a variety of telecom, 
applications, such as fraud detection, service planning, and traffic analysis. For ex- 
ample, various types of toll frauds have been discovered where the fraudsters’ calling 
behavior may be either abnormal or normal [20]. In both cases, calling behavior pro- 
filing and pattern matching are key to fraud detection, as illustrated below: 

• A caller’s behavior may be considered abnormal if some thresholds of calling 
volume, duration and their aggregations are exceeded. However, without informa- 
tion about individuals’ calling behaviors, thresholds can only be set universally in 
a conservative way, such as “a call is suspicious if its duration exceeds 24 hours”. 
Based on such thresholds, fraud detection may not be made accurately. With the 
availability of a customer’s individual calling behavior profile, personalized or 
group-based thresholds can be set, which provide more precise fraud detection 
than generalized thresholds. 
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• In service stealing cases, even if the fraudster’s calling volume, duration, etc., are 
below the preset thresholds, his calling behavior may still be abnormal in the sense 
that the calling destination, time, etc, differ significantly from those of the original 
snbscriber. Therefore, detecting dissimilar calling behaviors becomes a key indi- 
cator of this type of fraud. 

• A different kind of frand called snbscription fraud is characterized by normal 
calling behaviors. It occurs when service is ordered but there is no intention of 
paying. Typically, a frandster snbscribes for a telephone service under assumed 
identity with a clean credit record. The fraudster uses the telephone service without 
paying or produces only partial payments. Later, the fraudster opens another ac- 
count under a different identity, and repeats the procedure under that new identity. 
Very often, the frandster has normal calling behavior, so the key indicator of this 
type of fraud is detecting calling behaviors that are similar to those of known 
fraudsters. 

Customer calling behavior profiles are generated from Call Detail Records (CDRs). 
The volume and flow rates of CDRs are typically very large; hundreds of millions of 
CDRs are created every day; also, CDRs must be continuously collected and analyzed 
for keeping customer profiles up to date. Thus, scalability becomes a critical issue for 
mining CDRs in real-time. At Hewlett-Packard Laboratories, we have developed a 
data warehouse and OLAP based framework for bnsiness intelligence that we have 
used for customer profiling and comparison [6,9,11,12]. 

Since the similarity of cnstomer behavior can be represented from different angles, 
we compare calling patterns derived from cnstomer calling behavior profiles, rather 
than comparing profiles directly. For example, some calling patterns might be similar 
in terms of the volume (total number or duration) of calls to a set of callees, others 
might be similar in terms of the time windows when these calls were made. Our ob- 
jective, therefore, is to enable the comparison of calling patterns along mnltiple di- 
mensions and at multiple levels of the dimension hierarchies. This type of multi- 
dimensional, multi-level pattern extraction and comparison is facilitated throngh the 
use of OLAP. Like many existing efforts, we take advantage of OLAP technology for 
analyzing data maintained in data-warehouses [1,13,14,17]. In particnlar, we perform 
large-scale data mining on an OLAP based computation platform [18,19]. However, 
to our knowledge, there is no prior work reported on OLAP based cnstomer behavior 
profiling and pattern analysis. 

Our approach has the following major features. First, we have integrated data ware- 
housing and OLAP technologies to provide a scalable data management and data 
mining framework. While providing multilevel and multidimensional data analysis 
[2,3,21,25,27], we emphasize the use of OLAP servers as scalable computation en- 
gines rather than only as front-end analytical tools. Thus, we have actually imple- 
mented the whole application through “OLAP programming”, i.e., as programs writ- 
ten in the scripting language supported by the OLAP server. 
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Next, we show how to extract calling behavior patterns from customer profiles, and 
we introduce multilevel and multidimensional probability distribution based pattern 
representations. 

We develop a formalism for defining and computing multi-dimensional and multi- 
level similarity measures. We show how the calling patterns that represent customers’ 
calling behaviors can be compared using these similarity measures. There exist a 
number of similarity comparison mechanisms applied to text retrieval, image pattern 
recognition, etc [5,7,15,16,23,24,26]. However, our approach is unique in comparing 
behavior patterns based on cube similarity, and in providing multilevel and multi- 
dimensional cube similarity measures. 

Finally, our framework supports profiling customer calling behavior incrementally on 
a continuous basis, [6,9]. This differentiates our approach from most CDR mining 
efforts that are focused on analyzing historical data. The incremental and parallel 
OLAP architecture further supports scalability. 

Section 2 introduces the system architecture for OLAP and data warehouse based 
calling behavior profiling and multilevel multidimensional pattern analysis. Section 3 
describes the formalism for cube similarity comparison. Section 4 gives some conclu- 
sions and future directions. 



2 OLAP-Based Customer Profiling 

In this section, we first describe how customer profiles can be represented as data 
cubes. Then, we describe the architecture of a profiling engine, and the process of 
using the engine to compute profiles and calling patterns. 



2.1 Representing Customer Behavior by Data Cubes 

We measure customer profiles and calling patterns in multiple dimensions and at 
multiple levels, and in terms of volumes and probability distributions. These measures 
are expressed in the form of data cubes. A cube C has a set of underlying dimensions 
Dj,..., D^, and is used to represent a multidimensional measure. Each cell of the cube 
is identified by one element value from each of the dimensions, and contains a value 
of the measure. We say that the measure is dimensioned by D^,..., The set of ele- 
ments of a dimension D, called the domain of D, may be limited to a subset. A sub- 
cube (slice or dice) can be derived from a cube C by dimensioning C by a subset of its 
dimensions, and/or by limiting the element sets of these dimensions. 

We first have to decide which features (dimensions) are relevant. For our calling 
behavior profiling application, the features of interest are the phone-numbers, volume 
(the number of calls), duration, time of day, and day of week for a customer’s outgo- 
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ing and incoming calls. Next, we have to select the granularity of each feature. Thus, 
the time of day feature may be represented by the time-bins ‘morning’, ‘afternoon’, 
‘evening’ or ‘night’; the duration feature may be represented by ‘short’ (shorter than 
20 minutes), ‘medium’ (20 to 60 minutes), or ‘long’ (longer than 60 minutes). Fi- 
nally, we have to decide the profiling interval (e.g., 3 months) during which the cus- 
tomer profiles will be constructed, and the periodicity of the profiles (e.g. weekly). 
Thus, in our application, a customer’ s profile is a weekly summarization of his calling 
behavior during the profiling interval. 

A profile cube, say PF (defined below in the language of Oracle Express, the OLAP 
server we use), is a volume cube. It holds the counts of calls during the profiling 
period, dimensioned by caller, callee, time, duration and dow (day of week), where 
dimension time has values ‘morning’, ‘evening’, etc; duration has values ‘short’, 
‘long’, etc; dimension caller and callee contains the calling and called phone num- 
bers. 

define PF variable int <sparse <duration time dow callee caller» inplace 

Note that the use of keyword “sparse” in the above definitions instructs Oracle Ex- 
press to create a composite dimension <duration time dow callee caller>, in order to 
handle sparseness, particularly between calling and called numbers, in an efficient 
way. A composite dimension is a list of dimension-value combinations. A combina- 
tion is an index into one or more sparse data cubes. The use of a composite dimension 
allows storing sparse data in a compact form similar to relational tuples. 

A cell in the cube is identified by one value from each of these dimensions. Eor ex- 
ample, the value in the cell identified by duration = ‘short’, time = ‘morning’, caller 
= ‘6508579191’, callee = ‘6175552121’, is the number of calls made from 
‘6508579191’ to ‘6175552121’ in the mornings (say, 8-12am) that are ‘short’ (say, 
less than 20min), during the profiling period. 

A volume cube, say PE, is populated by means of binning. A call data record contains 
fields with values mapping to each dimension of the PE cube (Eigure 2). Eor example, 
’10:39am’ is mapped to time-bin ‘morning’. Binning can be a computational proce- 
dure, and the results computed from the raw data are used to update cube cells; for 
instance, call duration is calculated as the time difference from the initial address 
message to the release message. 

Various cubes can be derived as formulas from the above basic cube. The ability to 
use formulas to define measures over a multi-dimensional space is a powerful feature 
of OLAP tools. Eurther, cubes can computed from other cubes through OLAP pro- 
gramming. 
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Fig. 2: Populating Cube by Binning 



2.2 Profiling Engine 

To create and update customer behavior profiles, hundreds of millions of call records 
must be processed every day. The profiling engine is built on top of an Oracle-8 
based data-warehouse and Oracle Express, an OLAP server. CDRs are fed into the 
data warehouse daily and dumped to archive after use. Customer behavior profiles 
and other reference data are persistently stored in the warehouse, and handled in the 
OLAP multidimensional database (MDB) as data cubes. The profiling engine archi- 
tecture and flow of data is shown in Figure 3. 

Customers’ calling behavior profiles are built and incrementally updated by staging 
data between the data-warehouse and the OLAP multidimensional database [17-21]. 
CDRs are loaded into call data tables in the data-warehouse, and then loaded to the 
OLAP server to generate a profile-snapshot cube that is multi-customer oriented. In 
parallel with the above step, a profile cube covering the same area is retrieved from 
the data-warehouse. A profile cube and a profile-snapshot cube have the same un- 
derlying dimensions. The profile cube is updated by merging it with the profile- 
snapshot cube. The updated profile cube is stored back to profile tables in the data- 
warehouse. The frequency of data exchange between the data-warehouse and the 
OLAP server is controlled by certain data staging policies. The size of each profile 
cube may be controlled by partitioning the customers represented in a profile cube by 
area; and by limiting the profiling period. To reduce data redundancy and query cost, 
we chose to maintain minimal data in the profile tables in the data-warehouse. We 
include multiple customers’ calling information in a single profile table or profile 
cube, without separating information on outgoing calls and incoming calls. We make 
the relational schema of the profile table directly correspond to the base level of the 
profile cube. Derivable values at higher levels are not maintained in the data- 
warehouse. 
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Fig. 3: Data-warehouse/OLAP based profiling engine for fraud detection 



From the calling profile cubes, individual customer based, multilevel and multidi- 
mensional calling patterns are derived. The similarity of calling patterns belonging to 
different customers, or belonging to the same customer but for different profiling 
periods, can then be computed. The profiling engine can be used by a fraud detection 
application to generate alarms when suspicious events are detected (e.g., a call ex- 
ceeds some threshold, an abnormal calling pattern occurs; or a pattern similar to a 
known fraudulent one occurs). An investigator can then examine the case database to 
determine whether a fraud indeed occurred. 



2.2 Multi-level and Multi-dimensional Calling Pattern Cubes 

A calling pattern cube is associated with a single customer for representing the indi- 
vidual calling behavior of that customer. Multiple calling pattern cubes may be gen- 
erated to represent a customer’s calling behavior from different aspects. They may be 
based on volumes or probability distributions; and they may be materialized (defined 
as variables) or not (defined as formulas). In our design, probability-based calling 
pattern cubes are derived from volume-based ones. 

Volume based patterns 

A volume based calling pattern summarizes a customer’s calling behavior by count- 
ing the number of calls of different duration in different time-bins. Represented as 
cubes, they are commonly dimensioned by time, duration and dow (day of week), and 
in addition, for those related to outgoing calls, dimensioned by callee, and for those 
related to incoming calls, dimensioned by caller. Their cell values (measures) repre- 
sent the number of calls. For example, a calling pattern might express that there were 
300 short calls, 100 medium calls, and 50 long calls in the mornings of the profiling 
period, from one specific phone number to another. 
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Calling pattern cubes are derived from profile cubes. For example, the outgoing and 
incoming calling behavior of a customer X whose phone number is 1002003000, is 
defined by cubes 

define Cx-out variable int <duration time dow callee> 
define Cx-in variable int <duration time dow caller> 

and can be easily extracted from profile cube PF by 

Cx.out = PF (caller '1002003000') 

Cx.in = PF (callee '1002003000') 



Probability distribution based patterns 

A probability distribution based calling pattern represents a customer’s calling 
behavior with probability distributions. For example, a calling pattern from one 
phone number to another might express that 70% of the calls in the morning were 
short, 20% were medium, 10% were long. 
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Figure 4: Probability cubes derived from a volume cube 



Cubes representing probability distribution-based calling patterns provide more fine- 
grained representation of dynamic behavior than volume-based ones [22]. They also 
allow calling patterns corresponding to different lengths of the profiling interval to be 
compared. For threshold based fraud detection applications, a long duration call may 
be noticed before reaching an absolute threshold, and monitored closer and closer as 
the probability of fraud becomes higher and higher. Probability distribution-based 
calling patterns also provide more details of individual behavior, not seen in fixed- 
value based calling patterns. 

As shown in Figure 4, cubes representing different probability measures can be de- 
rived from a profile cube or a volume-based pattern cube. For example, cube Cp^.out 
for a customer represents the dimensioned probability distribution of outgoing calls 
over all the outgoing calls made by this customer, and is derived from Cx.out in the 
following way 

define Cpx-out formula (Cx-out / total (Cx-out) ) decimal <duration 

time dow callee> 
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This formula says that Cp^.out is a decimal cube dimensioned by duration, time, dow 
and callee and computed by formula (C^.out / total(C^.out)). The value of a cell is the 
above probability corresponding to the underlying dimension values. 

Another cube, Cpp^.out, representing the probability distribution of a customer’s out- 
going calls over his total calls to the corresponding callee, can also be derived from 
C„.out as follows: 

define Cppx-out formula (Cx-out/total (Cx-out, callee)) decimal <dura- 
tion time dow callee> 

For efficiency as well as consistency, it is only necessary to store profile cubes per- 
sistently in the data warehouse. Calling patterns, either based on volume or probabil- 
ity, can be derived on the fly (at analysis time) using the OLAP engine for computa- 
tion. This shows the simplicity, and yet the power, of OLAP for customer profiling. 

Hierarchical dimensions for multilevel pattern representation 

To represent customer calling patterns at multiple levels, dimensions dow, time and 
duration are defined as hierarchical dimensions, along which the calling pattern cubes 
can be rolled up. 

A hierarchical dimension D contains values at different levels of abstraction. Associ- 
ated with D there are a dimension DL describing the levels of D, a relation DL_D 
mapping each value of D to the appropriate level, and a relation D_D mapping each 
value of D to its parent value (the value at the immediate upper level). Let D be an 
underlying dimension of a numerical cube C such as a volume-based calling pattern 
cube. D, together with DL, DL_D and D_D, fully specify a dimension hierarchy. 
They provide sufficient information to rollup cube C along dimension D, that is, to 
calculate the total of cube data at the upper levels using the corresponding lower-level 
data. A cube may be rolled up along multiple underlying dimensions. 

For example, the dow (day of week) hierarchy is made of dimensions dow and 
dowLevel, and relations dowLevel_dow and dow_dow, as illustrated in Figure 5. 
Analogously, the duration hierarchy is made of dimensions duration and durLevel, 
and relations durLevel_dur and dur_dur. The durLevel dimension has values 'dur_bin' 
and 'dur_all'; the duration: dimension has values ‘short’, ‘medium’, ‘long’ at 'dur_bin' 
level, and 'all' at 'dur_all' level (top-level). The time hierarchy is made up of dimen- 
sion time', dimension timeLevel with values 'day', ‘month’, ‘year’ and ‘top’; parent 
relation time_tinie and level relation timeLevel_time, etc. 



dow dimension values 


dowLevel 


dow_dow 


Monday, .., Friday 


dd 


wkday 


Saturday, Sunday 


dd 


wkend 


wkday, wkend 


ww 


week 


Week 


week 


NA 



Fig. 5 : Dimension hierarchy (day of week) 
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In a volume-based calling pattern cube, each cell value is a quantitative measure. The 
cube may be rolled up along its hierarchical dimensions. For example, C^.out and 
C^.in can rolled up along dimensions duration, time and dow. Therefore, 

Cx • out (duration 'short', time 'morning', dow 'MON') 

measures the number of short-duration calls this customer made to each callee (di- 
mensioned by callee) on Monday mornings during the profiling interval. After this 
cube is rolled up, 

Cx • out (duration 'all', time 'allday', dow 'week') 

measures the total number of calls this customer made to each callee (total calls di- 
mensioned by callee) during the profiling interval. 

It does not make sense to roll up probability-based pattern cubes. However, such a 
cube may be defined on a volume cube that has already been rolled up. 



3 Calling Pattern Similarity Comparison 

Calling pattern comparison is important for such applications as fraud detection. 
Since the similarity of customer behavior can be represented from different angles, 
we compare calling patterns derived from customer calling behavior profiles, rather 
than comparing profiles directly. For example, some calling patterns might be similar 
in the volume of calls to the same set of callees, others might be similar in the time of 
these calls such as late nights. Our objective, therefore, is to enable the comparison of 
calling patterns along multiple dimensions and at multiple levels of the dimension 
hierarchies. 

Given two input calling pattern cubes, say C, and C^, the output of the comparison is a 
similarity cube, say C,., rather than a single value. The similarity cube C,, can be di- 
mensioned differently from cubes C, and being compared. Each cell of C repre- 
sents the similarity of a pair of corresponding sub-cubes (slices or dices) of C, and C^. 

Computing similarity cubes requires the following: 

• The mapping from a cell of C to a pair of corresponding sub-cubes of C, and C^. 

• The algebraic structure for summarizing cell-wise comparison results of a pair of 
sub-cubes to a single similarity measure, which will be stored in the corresponding 
cell of C . We have introduced two similarity measures. One treats a sub-cube as a 
bag, and summarizes cell-wise comparison results based on bag overlap. The other 
treats a sub-cube as a vector, and summarizes cell-wise comparison results based 
on vector distance. 



To compare customers’ behavior from multiple aspects (e.g. from the perspective of 
volume or probability distribution), different calling patterns may need to be com- 
pared. Hence, multiple similarity cubes may need to be generated. The choice of 
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which calling pattern cubes to use and which similarity cubes to generate is applica- 
tion specific. 



3.1 Bag-Overlap-Based Cube Similarity 

Bag overlap-based similarity is applicable to a special kind of cube, called a count- 
cube. The cell-values of a count-cube are non-negative integers for measuring counts. 
A volume-based calling pattern cube is a count cube. 

The similarity of two bags may be defined on the basis of their overlap. Let A and B 
be bags; let 0^(x) be the count of element x in bag A, that is, the number of copies of x 
in A; and let C , O and be bag containment, intersection, and union as defined 
below [4]. 

A c 5 (\/xe A)o ^{x) < o g{x) 

^glb^(A,5) 

where glb^ stands for the greatest lower bound of A and B under 
containment relationship, that is the largest bag C such that C C A and C 

AkjB lub^(A,5) 

where lub^ stands for the least upper bound of A and B under the bag-containment 
relationship, that is the smallest bag C such that A C C and 5 C C . With these 
notions, we define the similarity of bag A and B, jl{A, 5) , as 

^here 

2 151 

_ I An5l 

[JT 

We explain how to apply these formalisms to compare the similarity of count-cubes. 

First, we view a count-cube as a bag in the following way. An element of the bag is 
identified by the list of dimension-values underlying a cell of the cube, and the count 
of that element (the copies of that element in the bag) is represented by the cell value. 
A cell value 0 means that the element is not in the bag. Thus for example, 

C,j.out(duration ‘short’, time ‘morning’, dow ‘MON’, callee ‘6175552121’) = 10 

stands for a bag element with count 10. 

Next, since two count-cubes, say C, and Q, can be viewed as bags, they can be com- 
pared using the above bag-overlap based similarity measure, denoted by 

jx: <C, [0,1]. 



the bag- 

c5. 




12 Q. Chen, U. Dayal, and M. Hsu 



Further, any sub-cube (slice or dice) of a count-cube is also a count-cube, and thus 
can be mapped to a bag. This allows the corresponding sub-cubes of two count-cubes 
to be compared based on bag-overlap. 

Given two count-cubes CJD] and CJD] with the same underlying dimensions 
D=(Di, ..., DJ, computing their similarity, and putting the results into a similarity 
cube CJD ], proceeds as follows. 

Define the similarity cube CJDJ, with each cell of C corresponding to a sub-cube of 

C, (and CJ). This requires us to provide a mapping from a cube-cell e dimensioned by 

D, to a sub-cube C’ dimensioned by D’ c D, expressed as 
e[D} C’[D’] 

We have introduced the following two general types of cell-to- subcube mappings. 

Type A (“projection”) cell-to-subcube mapping: making c D. As a result, each 
cell of C is mapped to a sub-cube of C, (or CJ). As an example, assume that C, and 
Cj are dimensioned by <duration, time, dow, callee>, but C is dimensioned by <du- 
ration, time, dow>. Then each cell of C corresponds to a sub-cube of C, (and CJ) 
based on the same values as e on dimension duration, time and dow, and the bag of 
all values on dimension callee. 

Type B (“change level”) cell-to-subcube mapping: having dimension d,e associ- 
ated with a dimension ds D such that a value of d^ corresponds to multiple values of 
d. In other words, there exists a one-to-many relationship between d, and d. For 
example, if C, and Q are dimensioned by <duration, time, dow, callee> but C, is 
dimensioned by <durLevel, timeLevel, dowLevel> , and there exist many-to-one map- 
pings from duration, time and dow to durLevel, timeLevel and dowLevel respectively, 
then each cell of C, corresponds to a sub-cube of C, (and Q). These sub-cubes are 
based on multiple values on dimension duration, time and dow. 

For each cell e of C,, identify the pair of corresponding sub-cubes of C, and Q, say c, 
and Cj then the value of e is the similarity of c, and computed based on bag-overlap, 
i.e., value(e) = pic, cj). 



3.2 Vector Distance-Based Cube Similarity 

Conceptually, cubes may also be viewed as vectors, and compared for similarity 
based on vector distance. 



Given vectors t> and <t t’>, their distance d^ may be measured by the 

following formula 
L n 






i=l 



d 



1/2 
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and their similarity can be easily derived from their distance. We also tested the co- 
sine based vector similarity measure shown below. 





However, our experience was that the cosine measure is less sensitive for comparing 
vectors representing call volumes than the distance measure. 



Below, we explain how to view cubes as vectors and compute their similarity based 
on vector distance. 



A cube is viewed as a vector of field/value pairs in the following way. A field of the 
vector is identified by the dimension-values underlying a cube cell, and the value of 
that field is the cell value. For example, 

Cx-out(duration ‘short’, time ‘morning’, dow ‘MON’, callee ‘6175552121’) = 10 

stands for one vector element with field identified by duration=‘short’, time= ‘morning’, 
dow=‘MON’, callee=‘6175552121’ and value equal to 10. 

Viewing cubes, say C, and as vectors, allows us to compute their similarity v(C, 
Cj) based on normalized vector distance, denoted by: 

V.- <C, [0,1]. 

Further, any sub-cube (slice or dice) of a cube can also be mapped to a vector, and 
that allows each pair of corresponding sub-cubes of two cubes to be compared based 
on vector distance. 

Therefore, given two cubes C,[D] and CJD] with the same underlying dimensions 
D=(Dj, ..., DJ, computing their similarity and putting the comparison results into a 
similarity cube CJD ], proceeds as follows. 

Define the similarity cube CJDJ, with each cell of Q corresponding to a sub-cube of 
C, (and CJ), using a type A or type B cell-to-subcube mapping as described above. 

For each cell e of C,., select the pair of corresponding sub-cubes of C, and Q, say c, 
and Cj Then, the value of e is the similarity of c, and compared based on vector- 
distance, i.e., value(e) = v(c, cj). 

In the following sections we shall show several similarity cubes generated by com- 
paring calling pattern cubes. 



3.3 Volume-Based Calling Pattern Similarity Cubes 

Decimal cubes S..out and S-.in, which measure calling pattern similarity on outgoing 
and incoming call volumes, respectively, are defined as 
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define Sj.out variable decimal <duration time dow> 
define Sj.in variable decimal <duration time dow> 



The algorithm for computing S^.out is listed below. This similarity cube is dimen- 
sioned differently from the cubes, C^.out and C^.out, being compared. Each cell e of 
Sj.out represents the similarity of a pair of corresponding sub-cubes of C,,.out and 
C^.out. These sub-cubes are based on the same dimension values as e on duration, 
time and dow, but on all the values of the callee dimension. The sub-cube comparison 
is based on vector-distance and the type A cell-to-subcube mapping. 

Algorithm for generating volume-based similarity cube 

Input: volume-based calling pattern cubes C,^.out and Cy.out, dimensioned by <dura- 
tion, time, dow, callee>. 

Output: similarity cube Sj.out dimensioned by <duration, time, dow>, that is gener- 
ated from comparing C,^.out and Cy.out. 

Steps: 

reduce the size of dimension callee by limiting its values to 
those that make either Cx-out > 0 or Cy.out > 0. Keep all values 
of dimensions duration, time, and dow. 

compare (in Cx-out and Cy.out) the calling volumes for each cal- 
lee and generate a vector-based similarity measure corresponding 
to each combination of duration, time and dow. 
for each dow, time, duration 

{ d = sqrt (total ( (Cx-out - Cy.out) * (Cx- out - Cy.out))) 
a = sqrt (total (Cx-out * Cx-out)) 
b = sqrt (total (Cy. out * Cy.out)) 

convert vector distance to similarity and take some kind of 
average measure, e.g. 

Si. out = 1 - d/ (a+b) 

} 



Note that the cubes being compared are already rolled-up along hierarchical dimen- 
sion duration, time and dow. This allows S..out to present dimensioned similarity 
measures at multiple levels. For example, in the following slice of Sj.out, cell 

Si . out (duration 'all', time 'morning', dow 'week') = 0.93 

represents the volume-based calling pattern similarity for mornings, based on weekly 
summarization, across all days and all durations. 

S, .out 



DOW : week 



-SI . out- • 
DURATION- 



TIME 


all 


Short 


Medium 


Long 


allday 


0.90 


0.87 


0.64 


0.89 


Night 


0.78 


0.78 


1.00 


1.00 


Morning 


0.93 


0.93 


0.81 


0.88 


Af tnoon 


0.87 


0.83 


0 . 75 


1.00 


Evening 


0.91 


0.88 


0.59 


1.00 
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In the following slice of S..OUt, cell S|.out(duration ‘short’, time ‘morning’, dow ‘FRI’) = 0.95 
represents volume-based calling pattern similarity on short calls on Friday mornings. 

S^ .out 



DOW : FRI 



-SI . out - ■ 
DURATION- 



TIME 


all 


Short 


Medium 


Long 


allday 


0.95 


0.92 


0.55 


0.87 


Night 


0.82 


0.82 


1.00 


1.00 


Morning 


0.95 


0.95 


0.87 


0.87 


Af tnoon 


0.95 


0.91 


0.66 


1.00 


Evening 


0.95 


0.91 


0.49 


1.00 



3.4 Volume Based Calling Pattern Similarity Cube Dimensioned by Levels 

Similarity cubes S^.out (for outgoing calls) and S^.in (or incoming calls), which also 
measure volume-based calling pattern similarity but are dimensioned by levels rather 
than values of duration, time and dow, are defined as 

define Sv-out variable decimal <durLevel, timeLevel, dowLevel> 
define Sv-in variable decimal <durLevel, timeLevel, dowLevel> 

Let us consider S^.out for more detail. It is the similarity cube generated by comparing 
two volume-based calling pattern cubes C^.out and C^.out. Dimensioned by <dur- 
Level, timeLevel, dowLevel>, the cell of S^.out corresponding to dimension values 
durLevel = L^, timeLevel = L,, and dowLevel = L^, represents the similarity of a pair 
of corresponding sub-cubes of C,,.out and C^.out. These sub-cubes are based on the 
dimension values of duration, time and dow at level L, and L^ , respectively, and 
on all the values of callee dimension. To calculate a volume-based similarity cube, a 
type B cell-to-subcube mapping can be used, and the sub-cube comparison is based 
on bag-overlap . More frequently called numbers have higher contribution to pattern 
comparison, but calls to unmatched destinations have no contribution to the similarity 
of calling patterns. 

As examples, let us look at two cells in the S^.out instance shown below. 

Cell Sv - out (durLevel 'dur_bin', timeLevel 'timebin', dowLevel 'dd') 

= 0.82 

represents the similarity of a corresponding pair of volume-based sub-cubes of 
Cx-out and Cy.out. These sub-cubes are based on values ‘short’, ‘medium’, ‘long’ 
of dimension duration at level ‘dur_bin’\ values ‘night’, ‘morning’, ‘afternoon’, 
‘evening’ of dimension time at level ‘time_bin’\ values ‘MON’ to ‘SUN’ of di- 
mension dow at level ‘dd’\ and all values of dimension callee. The value of this 
cell is the bag overlap based comparison of the above pair of sub-cubes. 
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Cell Sv - out (dur Level 'dur_all', timeLevel 'time_all', dowLevel 
'week' ) = 0.91 

represents the similarity of a pair of sub-cubes of C^-out and Cy.out that are based 
on high-level values of dimension duration, time and dow, and all values of di- 
mension callee. 

S .out 



DOWLEVEL : 


week 


SV.out-- 






-- 


DURLEVEL- 




TIMELEVEL 




dur all 


dur bin 


time all 




0.91 


0.88 


time bin 




0.91 


0.87 


DOWLEVEL : 


ww 


SV.out-- 








DURLEVEL- 




TIMELEVEL 




dur all 


dur bin 


time all 




0.91 


0.88 


time bin 




0.91 


0.87 



DOWLEVEL : dd 


SV.out-- 






DURLEVEL- 




TIMELEVEL 


dur all 


dur bin 


time all 


0.87 


0.84 


time bin 


0.85 


0.82 



In general, the degree of similarity may be higher at higher levels of the above di- 
mensions. This is because at time_bin and dur_bin levels, two calls to the same desti- 
nation at different time of the day and with different duration may be considered 
somewhat different. However, at top levels of these dimensions, such a difference is 
removed, since the top-level valne of the time dimension covers all day, and the top- 
level value of duration dimension covers any length of call. 



3.5 Probability Based Similarity Cubes 

The similarity of volume-based calling patterns is meaningful only when they cover 
the same time-span. This limitation can be eliminated in measuring the similarity of 
probability-based calling patterns. This is especially useful in comparing a preset 
calling pattern with an ongoing one in real-time. 



The following cubes measure the similarity of probability-based calling patterns. 

define Sp.out variable decimal <durLevel, timeLevel, dowLevel> 
define Sp.in variable decimal <durLevel, timeLevel, dowLevel> 
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define Spp.out variable decimal <durLevel, timeLevel, dowLevel> 
define Spp.in variable decimal <durLevel, timeLevel, dowLevel> 

S^.out and S^^.out are related to outgoing calls, while S^.in and Spp.in to incoming calls. 
S^.out is used to measure the similarity of calling patterns like Cp^.out defined before 
that take into account the dimensioned probability distribution of a customer’s calls 
over all the calls he made. However, S .out is used to measure the similarity of call- 
ing patterns like Cpp^.out that take into account the dimensioned probability distribu- 
tion of a customer’s calls dimensioned by callees. The analogous difference exists 
between S^.in and S^^.in. 

Like S .out, S .out and S .out are also dimensioned by <durLevel, timeLevel, 
dowLevel>. Each of these similarity cubes is calculated by comparing two probabil- 
ity-based calling pattern cubes based on vector-distance using a type B cell-to- 
subcube mapping. The algorithm used to generate S^^.out takes two calling pattern 
cubes, Cpp^.out and Cpp^.out (defined in the same way as Cpp^.out) as input. (Since 
C„„ .out and C„,..out are “views” of C .out and C .out, the latter can also be considered 
input cubes.) 

An instance of S^^.out is illustrated below. Let us look at the following two cells that 
are based on the same dimension values as the cells shown in the above examples. 

.Z’ Cell Spp . out (dur Level 'dur_bin', timeLevel 'time_bin', dowLevel 'dd') = 
0.61 

says that there is 61% similarity between a corresponding pair of probability-based 
sub-cubes of Cpp^.out and Cppy.out. These sub-cubes are based on low-level values 
of dimension duration, time, dow and all values of dimension callee. The value of 
the above cell is the vector-based comparison of the above pair of sub-cubes. 

Cell Spp . out (durLevel 'dur_all', timeLevel 'time_all', dowLevel 'week') 
= 1.00 

says that there is 100% similarity of a pair of sub-cubes of Cpp^.out and Cpp^out 
that are based on high-level values of dimension duration, time and dow, and all 
values of dimension callee. 

S .out 

—PP 

DOWLEVEL: week 

spp.out 

DURLEVEL 

TIMELEVEL dur_all dur_bin 

time_all 1.00 0.71 

time_bin 0.94 0.70 

DOWLEVEL : ww 

spp.out 

DURLEVEL 

TIMELEVEL dur all dur bin 



time_all 
time bin 



0.95 

0.92 



0.72 

0.71 
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DOWLEVEL : dd 

SPP.out 

DURLEVEL 

TIMELEVEL dur all dur bin 



timeall 0.77 0.63 

time bin 0.73 0.61 



4 Conclusions 

Customer behavior profiling and calling pattern analysis are important for many busi- 
ness intelligence applications in telecommunications and e-commerce. However, the 
huge data volumes and flow rates require a scalable computation engine. This has 
motivated us to develop a data warehouse and OLAP based framework. 

In our approach, the OLAP engine actually serves as a scalable computation engine. 
From a performance point of view, it supports indexed caching, reduces database 
access dramatically and extends main memory based reasoning. From a functionality 
point of view, it allows us to deliver powerful solutions for profiling, pattern genera- 
tion, analysis and comparison, in a simple and flexible way We have developed a 
multidimensional and multilevel cube similarity comparison formalism for comparing 
customer calling patterns in terms of cube manipulation. A prototype has been im- 
plemented at HP Labs, on top of an Oracle-8 based data-warehouse with an Oracle 
Express OLAP server. 

Our work demonstrates the practical value of using an OLAP server as a scalable 
computation engine for creating and updating profiles, deriving calling patterns from 
profiles, as well as analyzing and comparing calling patterns. We plan to integrate the 
OLAP framework with other enabling mechanisms to support various kinds of dis- 
tributed Internet-based business intelligence applications [8,10,12]. 
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Abstract. We performed association rule mining and sequence pattern 
mining against the access log which was accumulated at NTT Software 
Mobile Info Search portal site. Detail web log mining process and the 
rules we derived are reported in this paper. The integration of web data 
and relational database enables better management of web data. Some 
researches have even tried to implement applications such as web mining 
with SQL. Commercial RDBMSs support parallel execution of SQL. Par- 
allelism is key to improve the performance. We showed that commercial 
RDBMS can achieve substantial speed up for web mining. 



1 Introduction 

The analysis of web log to understand the characteristics of web users has been 
one of the major topics in web mining. The goal is to provide personalized 
information retrieval mechanism that match the need of each individual user. 
’’One to one marketing” on the web also has similar objectives. The development 
of the web personalization technologies will certainly benefit e-commerce too. 

In this paper, we focused on the mining access log using association rule 
discovery techniques. We show some mining results from web log of Mobile Info 
Search(MIS), a location-aware search engine [18]. Usage mining of this unique 
site could give some interesting insights into the behavior of mobile device users 
which are the targets of this site. 

Here we report some of the web mining techniques based on association rule 
that can be accomplished by some modified SQL queries on relational database. 
The integration of web with database techniques has drawn attention from re- 
searchers. Some have proposed query languages for the web that is similar with 
SQL such as Squeal [9] and WebSQL[5]. They emphasize better organization of 
web data managed in relation database way. We extend this concept for real 
applications of web mining. We also address the performance problem by paral- 
leling the execution of SQL queries. 

Although the amount of log at MIS is not so large, generally at large portal 
site it tends to be very large. The log can reach several tens of GB per day. Just 

* IBM Japan Co., Ltd. 1-1, Nakase, Mihama-ku, Chiba-shi, Chiba 261-8522, Japan 
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one day log is not enough for mining. If we are going to use several weeks log, 
then we have to handle more than one terabyte of data. Single PC server cannot 
process such huge amount of data with reasonable amount of time. 

On the other hand recently most major commercial database systems have in- 
cluded capabilities to support parallelization although no report available about 
how the parallelization affects the performance of complex query required by 
association rule mining. This fact motivated us to examine how efficiently SQL 
based association rule mining can be parallelized and speeded up using commer- 
cial parallel database system (IBM DB2 UDB EEE) . We propose two techniques 
to enhance association rule mining query based on SETM [10]. And we have also 
compared the performance with commercial mining tool (IBM Intelligent Miner). 
Our performance evaluation shows that we can achieve comparable performance 
with commercial mining tool using only 4 nodes. Some considerable works on 
effective SQL queries to mine association rule such didn’t examine the effect of 
parallelization [15] [16]. Some of authors have reported a performance evaluation 
on PC cluster as parallel platform [13] [14]. Comparison with natively coded pro- 
grams is also reported. However we use currently available commercial products 
for the evaluation here. 

2 Web Usage Mining for Portal Site 

2.1 Web Access Log Mining 

Access log of a web site records every user requests sent to the web server. From 
the access log we can know which pages were visited by the user, what kind of 
CGI request he submitted, when was the access and it also tells to some extent 
where the user come from. Using those information, we can modify the web site to 
satisfy the need of users better by providing better site map, change layout of the 
pages and the placement of links etc. [12] has proposed the concept of adaptive 
web site that dynamically optimize the structure of web pages based on the 
users access pattern. Some data mining techniques has been applicated on web 
logs to predict future user behavior and to derive marketing intelligence[21][7][8]. 
Currently many e-commerce applications also provides limited personalization 
based on access log analysis. Some pioneers such as Amazon.com have achieved 
considerable success. 

Here we will show the mining process of real web site. We have collaborated 
with NTT Software to analyze the usage of a unique search engine called Mobile 
Info Search. 



2.2 Mobile Info Search(MIS) 

Mobile Info Search (MIS) is a research project conducted by NTT Software 
Laboratory whose goal to provide location aware information from the inter- 
net by collecting, structuring, organizing, and filtering in a practicable form[17]. 
MIS employs a mediator architecture. Between users and information sources. 
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MIS mediates database-type resources such as online maps, internet “yellow- 
pages” etc. using Location- Oriented Meta Search and static files using Location 
Oriented Robot-based Search. Users input their location using address, nearest 
station, latitude-longitude or postal number. If the user has a Personal Handy 
Phone(PHS) or Geo Positioning System(GPS) unit, the user location is auto- 
matically obtained. 

The site is available to the public since 1997. Its URL is http://www.kokono.net. 
In average 500 searches are performed on the site daily. A snapshot of this site 
is shown in Figure 1. ^ 



Mobile Info Search 2 Ver.2.00 



Location Information(2000/09/22 16:23:08) 

Tokyo, Chuo-ku, Ginza, 4-chome ZIP 104-0061 
(NL 35.40.4.7 EL 139.46.5.7) 

Nearest station : Ginza, Higashi-Ginza 
Kokono (nearby area) Search 
Shops Information Internet-Townpage 
Keywords [ ! type ] 

Maps 

Train Route 
Train Timetable 
Hotels 
Newspapers 
Weather Report 
TV Guide 



Fig. 1. Index page of Mobile Info Search 



MIS has two main functionalities : 

1. Location Oriented Meta Search 

Many local information on the web are database-type, that is the information 
is stored in backbone database. In contrast to static pages, they are accessed 
through GGI program of WWW server. MIS provides a simple interface for 
local information services which have various search interfaces. It converts 
the location information and picks the suitable wrapper for the requested 
service. Example of database-type resources provided are shown in table 1. 

2. Location-Oriented Robot-Based Search “kokono Search” 

kokono Search provides the spatial search that searches the document close to 
a location, “kokono” is a Japanese word means here, kokono Search also em- 
ploys “robot” to collects static documents from internet. While other search 
engines provide a keyword-based search, kokono Search do a location-based 
spatial search. It displays documents in the order of the distance between 
the location written in the document and the user’s location. 

^ The page is shown in Japanese at http://www.kokono.net 
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Table 1. Database-type resources on the Internet 



Service 


Location information used for the search 


Maps 


longitude-latitude 


Yellow Pages 


address ( and categories ... etc) 


Train Time Tables 


station 


Weather Reports 


address or region 


Hotel Guides 


nearest station 



3 Mining MIS Access Log and Its Derived Rules 

3.1 Preprocessing 

We analyzed the users’ searches from the access log recorded on the server be- 
tween January and May 1999. There are 1035532 accesses on the log, but the 
log also consists image retrieval, searches without cookie and pages that do not 
have relation with search. Those logs were removed. Finally we had 25731 search 
logs to be mined. 

— Access Log Format 

Each search log consists CGI parameters such as location information (ad- 
dress, station, zip), location acquisition method (from), resource type (sub- 
mit), the name of resource (shopjweb, mapzweb, railjweb, stationjweb, 
tvjweb), the condition of search (keyword, shopwond). We treat those pa- 
rameters the same way as items in transaction data of retail sales. In ad- 
dition, we generate some items explaining the time of access (accesszweek, 
access -hour). 

Example of a search log is shown in Figure 2. 

— Taxonomy of Location 

Since names of places follow some kind of hierarchy, such as “city is a part of 
prefecture” or “a town is a part of a city”, we introduce taxonomy between 
them. We do this by adding items on part of CGI parameter address. For 
example, if we have an entry in CGI parameters entry [address=Yamanashi- 
ken, Koufu-shi, Oo-satomachi] , we can add 2 items as ancestors : [address= 
Yamanashi-ken, Koufu-shi] at city level and [address=Yamanashi-ken] at 
prefecture level. In Japanese, “ken” means prefecture and “shi” means city. 

— Transformation to Transaction Table 

Finally we have the access log is transformed into transaction table ready 
for association rule mining. Part of transaction table that corresponds to log 
entry in Figure 2 is shown in Table 2 



3.2 Association Rule Mining 

Agrawal et. al.[l] first suggested the problem of finding association rule from 
large database. An example of association rule mining is finding “if a customer 
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0000000003 - - [01/Jan/1999:00:30:46 0900] "GET /index. cgi? 

sel_st=0&NL=35 .37.4. 289&EL=138 .33.45 . 315&address=Yamanashi-ken, 
Koufu-shi , Oosato-machi&station=Kokubo :Kaisumiyoshi :Minami-koufu: 
Jouei&zip=400-0053&from=address&shop_web=townpage&keyword= 
&shop_cond=blank&submit_map=Map&map_web=townpage&rail_web 
=s_tranavi&station_web=ekimae&tv_web=tvguideHTTP/l . 1" 200 1389 
"http: //www.kokono .net/mis2/mis2-header?date=1999/01/01 .00:27:59 
&address=Yamanashi-ken, Koufu-shi , 0osato-machi&NL=35 .37.4. 289&EL 
=138 .33.45 . 315&station=Kokubo :Kaisumiyoshi :Minami-koufu: Jouei&zip 
=400-0053&f rom=address&keyword=&shop_web=townpage&shop_cond=blank 
&map_web=townpage&station_web=&tv_web=tvguide" "Mozilla/4. 0 
(compatible; MSIE 4.01; Windows 98)$B! I(B"LastPoint=NL=35.37.4.289 
&EL=138 .33.45 . 315&address=Yamanashi-ken, Koufu-shi , Dosato-machi&station 
=Kokubo : Kaisumiyoshi : Minami-kouf u : Joueifczip=400-0053 ; LastSelect 
=shop_web=townpage&shop_cond=blank&keyword=&map_web=townpage&rail_web= 
s_tranavi&station_web=ekimae&tv_web=tvguide ; Apache=l ; MIS=1" 

Fig. 2. Example of an access log 



Table 2. Representation of access log in relational database 



Relation LOG 



Log ID 


User ID 


Item 


001 


003 


address=Yamanashi-ken 
, Koufu-shi, Oosato-machi 


001 


003 


address=Yamanashi-ken, Koufu-shi, 


001 


003 


address=Yamauashi-keu, 


001 


003 


station=Kokubo: 






Kaisumiyoshi:Miuami-koufu:Jouei 


001 


003 


zip=400-0053 


001 


003 


from=address 


001 


003 


submit_map=Map 


001 


003 


map_web=townpage 



buys A and B then 90% of them buy also C” in transaction databases of large 
retail organizations. This 90% value is called confidence of the rule. Another 
important parameter is support of an itemset, such as {A,B,C}, which is de- 
fined as the percentage of the itemset contained in the entire transactions. For 
above example, confidence can also be measured as support({A,B,C}) divided 
by support({A,B}). 

We show some results in Table 3 and 4. Beside common parameters such as 
confidence and support, we also use user that indicate the percentage of users 
logs that contain the rule. 

Those rules can be used to improve the value of web site. We can identify from 
the rules some access patterns of users that access this web site. For example, 
from the first rule we know that though Akihabara is a well known place in Tokyo 
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for electronic appliances/parts shopping, user that searches around Akihabara 
station will probably looks for restaurant. From this unexpected result, we can 
prefetch information of restaurant around Akihabara station to reduce access 
time, we can also provide links to this kind of user to make his navigation 
easier or offer proper advertisement banner. In addition, learning users behavior 
provides hint for business chance for example the first rule tell us the shortcoming 
of restaurants in Akihabara area. 

Table 3. Some results of MIS log mining regarding search condition 
Not so many good restaurants in Akihabara ? 

[keyword=] [address=Tokyo,] [station^ Akihabara] [shop_cond=restaurant] 

I In Hokkaido, people looks for gasoline stand at night from its address I 

[access_hour=20] [address=Hokkaido,] [from=address] [shop_web=townpage] 

=> [shop_cond=gasoline] 

People from Gifu- ken quite often searches for restaurants 
[address=Gifu-ken,| [shop_web=townpage] [shop_cond=restaurant] 

I However people from Gifu- ken search for hotels on Saturday I 

[access_week=Sat] [address=Gifu-ken,| [shop_web=townpage] [shop_cond=hotel] 
People from Gifu- ken must search for hotel around stations 

[address=Gifu-ken,| [shop_web=townpage] [station=Kouyama] ^ [shop_cond=hotel] 



Table 4. Some results of MIS log mining regarding time and location acquisition 
method 




[address=Hokkaido,j [shop_cond=conveni] [shop_web=townpage] =4 [from=address] 



3.3 Sequential Rule Mining 

The problem of mining sequential patterns in a large database of customer trans- 
actions was also introduced by Agrawal et. al.[3j. The transactions are ordered 
by the transaction time. A sequential pattern is an ordered list (sequence) of 
itemsets such as “5% of customers who buy both A and B also buy C in the 
next transaction” . 
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We show some sequential patterns that might be interesting in Table 5. Some 
patterns indicate the behavior of users that might be planning to do shop- 
ping. We can derive from second pattern that significant part of users check 
the weather forecast first, then they look for the shops in the yellow-pages ser- 
vice called “Townpage” then look again for additional information in the vicinity 
with kokono Search and finally they confirm the exact location in the map. 



Table 5. Some results of sequential pattern mining 



[After finding a shop, check how to go there and the weather 
[submit_shop=Shop Info] ^ [submit_rail=Search Train] 

^ [submit_newspaper=Newspaper] 

^ [submit _weather= Weather Forecast] 

[Or decide the plan after checking the weather first 
[submit_weather= Weather Forecast] 

^ [submit_shop=Shop Info] ]shop_web=townpage] 

^ [submit_kokono=Kokono Search] => [submit_map=Map] 
[Looking for shops after closing time 

[submit_shop=Shop Info] [access_hour=22] [access_week=Fri] 
=> [submit_map=Map] ]access_hour=22] ]access_week=Fri] 



4 Mining Web Log Using RDBMS 

The ability to perform web mining using standard SQL queries is a next challenge 
for better integration of web and RDBMS. The integration is essential since 
better management of web data has become a necessity for large sites. 

The performance issue was a major problem for data mining with RDBMS. 
We will show that commercial RDBMS on parallel platforms could handle the 
task with sufficient performance. 

4.1 Association Rule Mining Based on SQL 

A common strategy to mine association rule is: 

1. Find all itemsets that have transaction support above minimum support, 
usually called large itemsets. 

2. Generate the desired rules using large itemsets. 

Since the first step consumes most of processing time, development of mining 
algorithm has been concentrated on this step. 

In our experiments we employed three type of SQL query. First of all the 
standard SQL query using SETM algorithm [10]. It is shown in Figure 3. Sec- 
ond is the enhanced SQL query using view materialization technique. Third is 
another enhanced SQL query using subquery technique. 
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SQL query using SETM algorithm. Transaction data is transformed into 
the first normal form (transaction ID, item). In the first pass we simply gather 
the count of each item. Items that satisfy the minimum support are inserted 
into large itemsets table C_1 that takes form (item, item count). SETM employs 
temporary tables to reuse item combination in next pass. In first pass, transac- 
tions that match large itemsets are preserved in temporary table R_l. In other 
passes for example pass k, we first generate all lexicographically ordered candi- 
date itemsets of length k into another temporary table RTMP_k by self-joining 
table R_(k-1) that contains k-1 length transaction data. Then we generate the 
count for those itemsets, itemsets that meet minimum support are included into 
large itemset table C_k. Finally transaction data R_k of length k generated by 
matching items in candidate itemset table RTMP_k with items in large itemsets. 
In order to avoid excessive I/O, we disable the log during the execution. 



CREATE TABLE LOG (id int, item int); 

- PASS 1 

CREATE TABLE C_1 (item_l int, cnt int); 
CREATE TABLE R_1 (id int, item_l int); 

INSERT INTO C_1 

SELECT item AS item.l, COUNT(*) 
FROM LOG 

GROUP BY item 

HAVING COUNT(*) >= :min_support; 

INSERT INTO R_1 

SELECT P-id, p.item AS item_l 
FROM LOG p, C_1 c 

WHERE p.item = c.item_l; 

- PASS k 

CREATE TABLE RTMP_k (id int, item_l int, 
item_2 int, ... , item_k int) 

CREATE TABLE C_k (item_l int, 

item_2 int, ... , item_k int, cnt int) 
CREATE TABLE R_k (id int, item_l int, 

item_2 int, ... , item_k int) 

INSERT INTO RTMP_k 

SELECT P-id, p.item_l, p.item_2, ... , 
p.item_k-l, q.item_k-l 
FROM R_k-1 p, R_k-1 q 



WHERE p.id = q.id 

AND p.item_l = q.item_l 

AND p.item_2 = q.item_2 



AND p.item_k-2 = q.item_k-2 

AND p.item_k-l < q.item_k-l; 



INSERT INTO C_k 

SELECT item_l, item_2, ..., item_k, 
COUNT(*) 

FROM RTMP_k 

GROUP BY item_l, item_2, ..., item_k 

HAVING COUNT(*) >= :min_support; 

INSERT INTO R_k 

SELECT P-id, p.item_l, p.item_2, ..., 
p.item_k 

FROM RTMP_k p, C_k c 

WHERE p.item_l = c.item_l 

AND p.item_2 = c.item_2 



AND p.item_k = c.item_k; 

DROP TABLE R_k-1; 

DROP TABLE RTMP_k; 



Fig. 3. SQL query to mine association rule 



Enhanced SETM query using view materialize technique. SETM has 
to materialize its temporary tables namely R_k and RTMP_k. Those temporary 
tables are only required in the next pass and they are not needed for generating 
the rules. In fact, those tables can be deleted after execution of its subsequent 
pass. Based on this observation we could avoid materialization cost of those 
temporary tables by replacing the table creation with view. 



Enhanced SETM query using subquery technique. We expect significant 
performance improvement with utilization of view, however view still requires 
time to access the system catalog and are holding locks to the system catalog 
table during creating views so we further use subquery instead of temporary 
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tables. Therefore we embed the generation of item combinations into the query 
to generate large itemsets. 

4.2 Parallel Execution Environment and Performance Evaluation 

In our experiment we employed commercial Parallel RDBMS: IBM DB2 UDB 
EEE version 6.1 on IBM UNIX Parallel Server System: IBM RS/6000 SP. 12 
nodes make this system and using shared nothing architecture. Each node has 
POWER2 77Mhz CPU, 4.4GB SCSI hard disk, 256MB RAM and connected 
by High Performance Switch HPS with lOOMB/s network speed. We also used 
commercial data mining tool IBM Intelligent Miner on single node of RS/6000 
SP for performance comparison with the SQL based data mining. 

We show execution time of association rule mining with several minimum 
supports in Figure 4. The data used here is synthetic transaction data generated 
with program described in Apriori algorithm [2] to show that we can handle 
larger data with parallel RDBMS. The number of transactions here is 200000, 
average transaction length 10 and number of items 2000. Transaction data is 
partitioned uniformly by hashing algorithm corresponds to transaction ID among 
processing nodes’ local hard disks. We also show the result of commercial data 
mining program Intelligent Miner from IBM on single node for reference. 

Figure 4 shows the execution time on each degree of parallelization. On av- 
erage, we can derive that View and Subquery SQL is about 6 times faster than 
SETM SQL regardless of the number of nodes. The result is also compared with 
the execution time of Intelligent Miner on single processing node. It is true that 
Intelligent Miner on single node with transaction data stored in flat file is much 
faster than the SQL queries. However, the View and Subquery SQL are 50on 
single node if the transaction data have to be read from RDBMS. We exemplified 
that we can achieve comparable performance of Intelligent Miner on single node 
with flat file by activating only 4 nodes when we used View and Subquery SQL. 
The result gives evidence for the effectiveness of parallelization of SQL query to 
mine association rule. 

The speedup ratio is shown in Figure 5. This is also reasonably good, espe- 
cially View and Subquery SQL are not being saturated as the number of process- 
ing nodes increased. That means they can be parallelized well. The execution 
is 11 times faster with 12 nodes. In parallel environment, network potentially 
becomes botleneck which degrades the speed-up ratio. However our experiments 
suggest that association rule mining using variants of SETM is mostly CPU 
bound and network I/O is negligible. 

Here we give thorough comparison and analysis on the three variations of 
SQL query described before. The performance evaluation is done on 12 nodes. 
The mining is two passes long. It is well known that in most cases the second pass 
generates huge amount of candidate itemsets thus it is the most time consuming 
phase[4][5]. Our results are very much alike. Almost over SObelongs to PASS2 in 
all three SQL queries. Obviously View and Subquery SQL complete their first 
and second passes faster than SETM SQL query. We have recorded the execution 
traces of the three SQL in each PASS. The decomposition of execution time is 
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analysed as shown Figure 6 (PASS2) respectively. Comparing the elapsed time 
with the cpu time at Figure 6, we find that both are close for View SQL and 
Subquery SQL. This means these SQL’s are cpu bound, while SETM SQL is not 
cpu bound. Most of execution time of SETM query is dominated by disk write 
time for creating temporary table such as R_k and RTMP_k. We can also see 
that sort time is almost equal for all three SQL’s, which represents the cost of 
group by aggregation. In PASS2, SETM reuses item combinations in temporary 
table R1 on the secondary storage that is generated in PASSl. We replace it 
with view or subquery. Then data is transferred directly through memory from 
PASSl to PASS2. Figure 6 indicates that PASS2 of those modified SQL queries 
only read data from buffer pool. Thus the disk write time of View SQL and 
Subquery SQL is almost negligible, although it is dominant for SETM SQL. 
This analysis clarifies the problem of SETM and how to cost can be reduced for 
View and Subquery SQLs, which is the key to the performance improvement. 




Fig. 4. Execution time on parallel database environment 



5 Summary 

The web has change some paradigms of information retrieval. The challenge 
to provide satisfying answers from web queries faces problems from the chaotic 
nature of web page generation, the phenomenal growth of information on the web 
and the complexity of web structure. The technology of web mining has showed 
promising results to improve the information retrieval quality from the web. 
More sophisticated and complex web mining techniques will be inevitable for the 
future of web. To support that we need powerful systems, yet with reasonable 
cost. 






1 2 3 4 5 6 7 8 9 10 11 12 

# of nodes 



Fig. 5. Speedup ratio in parallel database environment 




Fig. 6. Decomposition of execution time of PASS2 for three types of SQL queries 



In this paper, we reported the result of mining web access log of a portal 
site for mobile users called Mobile Info Search. Two techniques are used : the 
association rule mining and sequential pattern mining. We have also examined 
the parallelization of SQL query to mine association rule on commercial RDBMS 
(IBM DB2 UDB EEE). We showed that good speedup ratio can be achieved, 
that means it is parallelized well. We also examined two variations of SETM SQL 
queries to improve performance, which reduce I/O cost by using View materialize 
or Subquery technique. 

We have compared the parallel implementation of SQL based association rule 
mining with commercial data mining tool (IBM Intelligent Miner) . Through real 
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implementation, we have showed that our improved SETM query using View or 
Subquery can beat the performance of specialized tool with only 4 nodes while 
original SETM query would need more than 24 processing nodes to achieve 
the same performance. We don’t have to buy expensive data mining tool, since 
parallel execution of SQL comes at no extra cost. It is also easy to implement 
and portable. 
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A persistent myth is that database systems must be data storage systems. 
Most are. The relational model of data [9,21], with its emphasis on tuples and 
tables, perpetuates this myth. So does the popular concept of data warehousing 
[ 11 , 20 ]. 

But the primary function of a data system should be that of a data provider. 

A data system exists to provide data to applications. The data may be stored, 
historical data. But, it need not be. A data system should often provide current 
data. The emphasis on database support for network applications provides an 
excellent example of this. In ABR congestion control, the current value of ACR(< 
sender JP >) is important. In less than 10 msec, it may be useless. Except 
for some excruciatingly detailed post-analysis of network behavior, one can not 
imagine this value ever being stored. 

An application can use data only if there is some appropriate language for 
describing what data it wants; coupled with expressions for ingesting it. SQL 
[3,6] emerged in the early 80’s as the dominant language for accessing and pro- 
viding data in the relational model. Besides its very apparent strengths, SQL 
(and its various derivatives [13,29,32]) has significant drawbacks that limit its 
utility in an internet environment. The specialization of SQL for network re- 
trievals called MySQL [5,12], eliminates many of the features that give standard 
SQL its power. It is nevertheless very popular. A better data provider is LDAP 
(Lightweight Directory Access Protocol) [17]. It consists of a library of relatively 
low level system calls which may be strung together to search and update net 
directories. However, we would contend that neither it nor SNMP (Simple Net- 
work Management Protocol) [30] are languages. Both provide syntactic rules 
governing message format, but have little of the coherence that one would like 
in a database “language” . 

A data provider requires a language interface for applications that will use 
the data it provides. Indeed, one can regard the language as the essential compo- 
nent of a data provider, just as SQL is the essential component of the relational 
database model. ^ In the following sections we describe a data provision language 
that we have been using for nearly ten years. It is called ADAMS [23,25,28]. Be- 
cause the way this language talks about and references data is not SQL, it has 
encouraged underlying implementations that are not relational. One such imple- 

* Research supported in part by DOE grant DE-FG05-95ER25254. 

^ The actual underlying implementation can easily be an object oriented database 
system. If SQL is the interface, it will “look” like a relational system. 
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mentation has provided nearly linear speed up in parallel, distributed, shared 
nothing applications [27] that dominate internet applications. 

We have at the University of Virginia a project to create an MIB (manage- 
ment information base) designed to manage real-time network data for quality of 
service applications. It is being tested in the department’s networks laboratory, 
equipped with three ATM switches donated by MCI WorldCom. In the next two 
sections, we provide an overview of selected portions of ADAMS which we be- 
lieve are relevant to this, and other network tasks. Perhaps the most important 
point we wish to make in this paper is that a coherent, tight language is the 
most difficult part of this project. Actual implementation is not. 



1 ADAMS — The Language 



Since we approached this project with an existing database language, we provide 
a brief survey of salient portions in order to provide a base of understanding for 
the real research. There are a few linguistic hurdles to surmount in order to 
understand ADAMS. First, ADAMS is an object-based language. That is, all 
data values are bound to objects, as in 

mary . age IP_address . cvdt 

so one must always access them with respect to an object, such as mary or 
IP -address. It is assumed that every object has a unique identifier, or oid. (In 
the ADAMS runtime system, the oid is a 64-bit string generated by the system. 
These oids are invisible in the language and cannot be directly manipulated. 
They can only be accessed by literal name or symbolic variable.) In the expression 
mary. age, mary is a symbolic literal denoting the object. It has the oid as its 
constant value. We assume that IP -address is a variable that can have many 
different oids assigned to it; it can denote different objects in succession. Since, 
age or cvdt (cell variance delay tolerance) are not objects; they have no oid; they 
only denote “values” in ADAMS. This is at variance with several object-oriented 
systems in which every thing is an “object” [19], but in accord with Tanenbaum’s 
observation 

“In the SNMP literature, these [state] variables are called objects, but 
the term is misleading because they are not objects in the sense of an 
object-oriented system. [33] 

It is important to keep the concept of an “object” distinct from that of “state”, 
or “value”. This is strongly enforced in ADAMS. 

Second, object “states”, which ADAMS calls “attributes”, are regarded ex- 
clusively as single valued functions of a single, object valued, variable. For 
this, ADAMS adopted the postfix functional dot notation commonly used by 
algebraists. That is, a primary term in ADAMS expressions takes the form 
<oid> . <attribute>, as in mary. age. 

A language allows us to “talk” about “things” of interest. Nouns consti- 
tute the “things” of a language, and in most languages the nouns are grouped 
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into a number of culturally determined classes.^ Programming languages had to 
introduce unambiguous type and class structures. ADAMS has a typical object- 
oriented class structure; it supports multiple inheritance. The major difference 
is that there are no explicit methods. Our objects, like most data, are purely 
passive; they have only attributes (or state). When an object attribute is refer- 
enced on the right side of an assignment, it is a data access operation, as in 
new_age <- mary . age + 1 
When encountered on the left side, as in 
mary. age <- new_age 

it is a write operator. The inspector and mutator methods of C-I--I- [10] are 
implicit in the attribute name and its use within the ADAMS expression. 

Where ADAMS departs from more traditional object-oriented development 
is that an ADAMS class does not define a struct (or storage structure). An at- 
tribute is not a field in a record, or column in a table. As noted above, attributes 
such as age or cvdt are regarded purely as single valued functions in both the 
ADAMS model and in its implementation. In the preceding paragraph we said 
ADAMS objects have no associated methods. It is equally correct to regard these 
functional object attributes as methods and assert that ADAMS objects are only 
bundles of associated methods bound by the oid; but with no other substance. 
One can choose either perspective. 

A functional, as opposed to a structural, view of data has two surprising 
payoffs which were not apparent when Shipman first proposed DAPLEX in 1981 
[31]. First, in a distributed environment it is often easier to distribute processes 
than to distribute data itself. It can reduce transmission cost and alleviate cer- 
tain coherence problems. Second, it facilitates dynamic class modification [26]. 
An ADAMS class declaration includes a set of its associated attributes. A new 
attribute, that is the attribute function designator, can be inserted into this set, 
or old one removed, at will. Subsequent usage automatically reflects the change. 
No record reformatting or indirection links are required. This kind of “on the 
fly” class schema redefinition has proven to be very valuable in the ongoing 
development of several experimental databases, which by their very nature are 
highly variable. When an initial database design isn’t quite correct, we change 
it without bringing the system down and repopulating it. 

All languages have proper nouns, that is singular objects that have specific 
names. Mechanisms to name objects are central to any system. In standard pro- 
gramming languages, naming objects that have finite scope is fairly easy because 
there is little chance for name conflict. Naming persistent objects of the internet 
or shared objects in a database is much more difficult. LDAP calls these dis- 
tinguished names or DN. Names must be unique, yet intelligible. It is easy to 
create unique synthetic identifiers such as numeric object identifiers, oids, or IP 
addresses; but these are seldom intelligible in human discourse.^ Unfortunately, 



^ This informal cultural classification determines what adjectives, or attributes, can 
be associated with a noun. 

® Working with these kind of unique identifiers is like coding in hex; it is occasionally 
necessary, but not recommended. 
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there are a relatively few nmenonic names by which to identify an essentially 
unbounded world of persistent objects. SQL solves this problem by restricting 
persistent names to only the attributes and the basic relations/tables of the sys- 
tem, all of which are controlled by a designated owner or system administrator. 

Both file systems and the internet DNS (Domain Name System) have adopted 
tree structured naming conventions in which edges are named and objects in a 
leaf are identified by concatenating the edge names in the unique path to that 
leaf."^ So we get longer and longer paths, much like German nouns. Eventually 
such names become unwieldy, else we would not need a “bookmark” feature in 
web browsers. ADAMS employs different name disambiguation methods, which 
include providing a name context scheme that is not tree structured and arbitrary 
name subscripting [24], as is common in mathematics. 

ADAMS, like SQL, eases the unique naming problem by employing variable 
names that are allowed to range over a designated set of objects (or tuples). 
ADAMS uses explicit predicate calculus quantifiers 
(for_all X in <set> ) ... 

(exists y in <set> ) ... 

If cs.virginia.edu denotes the IP addresses managed by our departmental server, 
then a quality of service algorithm might seek to identify and isolate those users 
anticipating high peak cell rates (per) with a loop expression of the form 
for_all ip in cs_virginia_edu do 
{ 

if (ip. per > 250000) 

{ 

<some appropriate code> 

} 

} 

A powerful variant of this construct is to loop over the attributes associated with 
an object.® In standard ADAMS applications we often employ the following kind 
of language construct: 

for_all s in cs. students do 

for_all attr in STUDENT. attribures_of do 

cout << attr .name_of « " " << s.attr << endl 
Observe that cs. students names a persistent set of aids; that s and attr are 
object variable names; that STUDENT is a class name; and that attributes.of 
and namc-of are two predefined system attribute functions which return a set 
of attributes and a string respectively. 



* The direction of this path may be “big endian” , which read left to right beginning 
with the root as in file system directory names, or “little endian” , which read left to 
right beginning with the leaf, as in LDAP. 

® In ADAMS, the attribnte itself is treated as a first class object. Only when it is 
evaluated with respect to a specific object do we get a value. Because attributes 
are first class objects, they can have associated attributes. This provides a unique 
opportunity for including certain forms of metadata within a database. It may, or 
may not, be relevant to a network MIB. 
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The preceding paragraphs have suggested some of the linguistic nuances that 
can be found in ADAMS. A tutorial and complete grammar for ADAMS can be 
accessed through [23]. If one turns away from the SQL mind set, it is possible 
to formulate linguistically complete, coherent languages that offer more flex- 
ible mechanisms for data provision. Network management informations bases 
(MIB’s) will need this flexibility if they are to be effective and if they are to be 
able to evolve as networks and network usage evolves. 



2 ADAMS — The Implementation 

ADAMS was designed as a functional database language. It was implemented in 
a functional manner as well. That is, each attribute is implemented as a singled 
valued function of a single object oid. Because when we store an attribute (on 
the left side of an assignment) we are dynamically changing the nature of the 
function, we cannot employ closed form functional code. Consequently, ADAMS 
uses functional lookup in the form of tree lookup. (We have chosen to use a very 
effective form of tree structure called an 0-tree [22]; but other forms such as 
hashed lookup could have been equally effective.) 

It should be noted at the outset that this kind of functional evaluation is too 
slow for many of today’s single processor, high performance database applica- 
tions. To involve and evaluate n separate functions just to access one n-tuple 
of data is too expensive. In a single processor environment, ADAMS has been 
effective only when there is a high premium for the kinds of flexibility that it 
offers. But, there are other environments in which this functional approach is 
markedly superior. First, if most of the data values the system is designed to 
provide are not stored, historical values that must be referenced, but rather are 
real-time state values that will be generated by the objects themselves, then this 
kind of functional model and implementation precisely captures the underlying 
configuration. 

Second, in distributed, shared nothing environments it is often much easier to 
distribute functional lookup in the form of procedures than to distribute actual 
data.® In particular, the distribution mangement tables are smaller. Moreover, 
because sets are collections of aids, not tuples, a very coarse grained parallelism 
that can even cross statement boundaries is possible [14] . One need not synchro- 
nize after the execution of every operator. 

Two models of parallel execution have been implemented. In the first, only 
attribute functions were distributed to k different processors [26]. In the rela- 
tional model, this would correspond to a vertical distribution of data. In [26] 
it was shown that by overlapping system latencies, ADAMS could on an Intel 
iPSC/2 4 processor hypercube system evaluate attribute functions with a 7- 
fold speedup (with 8 processors there was an apparent 12-fold speedup). But 
unfortunately, this speedup was neither sustainable nor scalable. 

® For relatively fine grained applications snch as database lookup, rpc calls are totally 
inappropriate. We used the light weight communications protocols found in [15]. 
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Fig. 1. Scale up of a moderately complex ADAMS query 



A better parallel model employs both a vertical distribution of attribute func- 
tions and a horizontal distribution of user objects [27]. Even reasonably complex 
queries involving the equivalent of relational joins show super-linear speed up 
(because of the extra cache resources brought by each additional processor) when 
run on 1, 2, 4, and 8 processor configurations. More impressively, these queries 
showed a nearly linear scale up as shown in Figure 1. 

Detailed examination of the internal message traffic indicates that this ef- 
ficiency is achieved because (a) the client-server model employs object servers 
rather than the more customary page servers [16], and (b) only aids are sent 
between server modules, significantly reducing message traffic [27]. 

What the preceding discussion shows is that an object-oriented system such 
as ADAMS can be, and has been, implemented in highly distributed environ- 
ments such as a network. With this background, it should not be hard to im- 
plement ADAMS with its well defined language on top of LDAP. ADAMS ex- 
pressions correspond nicely with LDAP routines. This makes retrieval based on 
attribute values quite easy. If this were to be coupled with the generation of 
object attribute (state) values in real-time, the ADAMS architecture should be 
extremely efficient. 



3 Object Identification and Naming 

The core difference between object-oriented and relational database models is 
object identification [18]. The core difference between object-oriented database 
systems and object-oriented programming languages is the persistence of object 
identifiers and their symbolic names. In this project we discovered that the core 
difference between network data managers and self-contained database systems 
is the presence of externally assigned identifiers and names. In this section we 
expand on this theme by illustrating the language ramifications of extending 
ADAMS to manage externally assigned identifiers. 
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In C++, all objects are eventually identified by virtual addresses. This works 
well in a procedural language whose programs include no persistent values. A 
language dealing with persistent data must have persistent identifiers, which are 
usually disk addresses. The database storage manager maintains the correspon- 
dence between these persistent addresses and the virtual addresses of the cached 
copies of the data. We all know how to implement storage (or cache) managers. 
But observe, a relational SQL system, as a language, only presents the user with 
a tuple space. It is not easy to interface this tuple space with the virtual address 
space of a procedural language — as anyone who had had to use the clumsy 
PL/SQL of ORACLE? [4] can attest. As a language, ADAMS was specifically 
designed with this difficulty in mind. ADAMS identifies its persistent objects 
with a 64-bit internal identifier; but presents its users with a uniform “object 
space” . These are accessed through a persistent “name space” that is made avail- 
able to the application by implementing the persistent name-oid correspondence 
within ADAMS itself. By providing two name spaces, scalar values can be freely 
exchanged between the application’s volatile data space and the persistent data 
space of ADAMS. Objects of the host application, if any, are inaccessible to 
ADAMS, and vice versa. Only data values without unique identification can be 
exchanged. 

Although much better than PL/SQL, this interface with the host program- 
ming language is still awkward since two distinct name spaces must be main- 
tained. The difficulties of combining persistent storage with a general purpose 
language were well known by the late 80’s. See [7] for a clear exposition regard- 
ing languages such as Pascal/R and PS- Algol. It helps to explain why these and 
EXODUS [8] never succeeded in attaining their expected potential. 

Since our 64-bit aids are completely artificial, we expected that we could eas- 
ily map network IP addresses into ADAMS aids in order to present the user with 
the same seamless object space and with minimal modification of the ADAMS 
syntax. 

It was not difficult to associate our artificial oids with IP addresses in much 
the same manner that we associate them with disk storage addresses. We have 
always used these oids as a kind of wrapper insulating the user code and runtime 
system from the messy details of actual disk storage. But, in the course of this 
project we learned two lessons. First, maintaining the correspondence between 
oid and IP addresses worked better than we had expected. Using raw LDAP 
calls, server query responses averaged 108 msec, depending on network load. 
(The range was from 35 msec to 5 sec!) The average overhead incurred in the 
ADAMS protocol was 18ms. (on average) with oid conversion contributing most 
of the cost. For this application, the relative cost of providing the oid wrapper 
is acceptable. 

The second lesson, was that using oids to identify network objects as well as 
user created objects had ramifications throughout the ADAMS language. Even 
though both are identified by an oid, they are very different kinds of objects. 
For example 
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1. Class declarations had to be revisited. A typical ADAMS class declaration 
looks like 

PERSON isa CLASS 

having attrs = { f_name, l_nEmie, age, gender } 

The having clause identifies all the attributes associated with an object in 
the class. The object identified by mary would be instantiated in this class, 
so the expression mary. age is meaningful. To accommodate network objects 
we had to create an identifier -is clause, as in 

NODE isa CLASS 

having attrs = { per, scr, mcr, evdt } 
identif ier_is IP_address 

This new clause is necessary for any class of objects which have identifiers 
that are uniquely assigned by an agency separate from the database system. 
These are similar to candidate keys of the relational model. 

ADAMS supports multiple inheritance. But, observe that any subclass (or 
superclass) of NODE must also employ IP address identifiers. Thus these ob- 
jects constitute a completely distinct hierarchy from other ADAMS objects. 

2. Object instantiation had to be changed. To instantiate mary as an element 
of the class PERSON, we use the ADAMS statement 

mary instantiates_a PERSON 

The runtime system generates a new oid which is associated with mary. On 
the other hand, NODE instantiation looks like 

node [k] instantiates_a NODE 

IP_address_is <host_variable_name> 

(The IP -address-is clause is mandatory if the class has a secondary identi- 
fier.) The runtime system first pings the node denoted by < host-variable- 
-uame > to verify that it exists. Then it issues an ADAMS oid to denote 
node[k\. From now on, all reference to the node in the system runtime code 
is by this oid; all reference to the node in user code is by the symbolic name 
node[k] which replaces the standard internet DNS. 

Readily, this object instantiation may fail. The given < IP_address > may 
be incorrect or the network connection may be faulty for any of a number of 
reasons. Surprisingly, this proves to be no real problem. Because there is no 
overall database administrator, and because the structure of every ADAMS 
database system can be modified by any of its users, the system had to be 
designed with fault tolerance in mind. It is assumed that any ADAMS state- 
ment may fail at run-time. This failure must not cause the application or 
the system to crash, and it should not propagate through the application 
any more than is logically required. 

3. Assignment must be refined. Consider an assignment statement such as 

mary . age <- mary . age + 1 

The right-side semantics for the expression mary .age are read semantics, 
while the left-side semantics are write semantics — as in any procedural 
language. But, 

node [k] . evdt <- node [k] . evdt * 0.5 
may, or may not, make sense in a data provider context. If one is representing 
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stored data about the node, then the assignment is valid. If one is providing 
data from these nodes, then it is meaningless. In this latter case, the right- 
side semantics of node[k].cvdt must involve an LDAP request to get this 
value. There can be no valid left-side (assignment) semantics. 

4. Attribute declaration is more complicated. Here, the interpretation of at- 
tributes as functions, or operators, is useful. In addition to defining the type 
(or codomain) of an attribute, e.g. whether it is integer, real, string, or object 
valued, we have to declare its nature, that is whether it is a stored attribute 
or an accessed attribute. This follows from our analysis of the assignment 
operator above. 

Because attributes in ADAMS are themselves objects, attribute declaration 
is a two-step process. One must first declare the attribute class, and then 
instantiate individual attributes within the class. For example: 
REAL_NET_ATTR isa REALATTRIBUTE 
domain_is NODE 

In this declaration REAL_ATTRIBUTE is a predefined system class (or 
type). The domain Js clause is needed to specify external identifiers, if any^ 
The attribute per would be instantiated by: 

per instantiates_a REAL_NET_ATTR 

right_semantics_are { < protocol > } 
left_semantics_are NULL 
Now, per can be used in expressions as in Section 1. 

5. Accommodation of network names. ADAMS has its own name space which 
does not accommodate extensible, tree-structured, symbolic names. Yet, 
these names of the DNS are important to manage domains, or sets, of nodes, 
or IP addresses. Consequently, we included an external mameJs clause in 
object instantiation statements, as in 

cs_virginia_edu instantiates_a N0DE_SET 

external_nEmie_is ‘ ‘cs.virginia.edu" 

This is somewhat awkward, but it works. 



4 Summary 

Years ago it was asserted that since persistence and type are “orthogonal”, one 
could add persistence to a standard programming language with only a few syn- 
tactic extensions. It was not really true. Last spring, we naively assumed that 
because an object was an object, regardless of whether it is on a disk or on the 
network, we should be able to make ADAMS a data provider with only a few syn- 
tactic extensions. It was not really true either. The required systactic extensions 
proliferated through the language, as illustrated in the preceding section, until 
they significantly reduced the simplicity and coherence that had characterized 
it. 

^ ADAMS follows the mathematical convention; a function is defined over its domain 
and returns values in its codomain. 
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The important lesson of this summer’s experiment has been that understand- 
ing the nature of objects being manipulated by the database system is crucial. 
There exist at least 3 dimensions to this object “nature”: 

a) whether the objects are persistent, or not; 

b) whether the objects are passive storage recepticles, or active agents; and 

c) whether the objects are locally identified and named, or not. 

It seems unlikely that a language originally designed to accommodate only a 
subset of these characteristics can be reasonably extended to accommodate the 
rest. At least it was not really possible with ADAMS. 

Nevertheless, we believe that many of the linguistic components found in 
ADAMS also belong in a network management information language, or NMIL. 
And some of the implementation techniques should be considered for inclusion 
in an underlying support system. 

The features we would expect in the NMIL are: 

Object-based flavor: (possibly object-oriented) Network components should 
constitute the fundamental objects. Neither LDAP [17] nor MySQL [5] have 
an object based flavor. 

Object naming: It should be possible to denote network objects with mnemon- 
ic, and even variable, names in a natural way. The object naming scheme 
must surely be compatible with the Domain Name System (DNS), but it 
is likely that a richer collection of names will be needed to reference finer 
components of the system. 

Dynamic class modiflcation: Given the rapid development of the internet, it 
is inconceivable that any static state characterization of network components 
is possible. The dynamic introduction of new component classes and dynamic 
modiflcation of attribute/state structures of existing classes will be essential. 

Functional emphasis: While a functional approach is probably not required, 
it seems to be more congruent with the distributed execution of the internet 
than any other data model. 

Predicate calculus syntax: Again, this probably is not mandatory. Many 
would prefer a syntax comprised of stock phrases that have limited para- 
metric substitution. It would be easier to learn and to optimize. But, it is 
hard to imaging any real reasoning about the net, or its management that 
does not rely on at least a 1st order predicate system. 

We have suggested that the actual underlying implementation be functional. 
That is, in lieu of an intermediate information provider for a specified domain 
of objects, or IPD, each object must itself provide the data/attributes/state 
associated with itself. This can be controversial. Some have said that having 
network objects provide current state in response to current state queries will 
place an undue demand on the service objects. We did not encounter this in our 
experimentation. In any case, allowing surrogate IPD’s that simulate such direct 
response should mitigates this criticism somewhat. To us, any other virtual im- 
plementation model (however actually implemented) is inconceivable. The goal 
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of the MIB system is to provide state data about the objects of a network system. 
What better source for this data than the objects themselves? 

Any process can access data; the I/O statements associated with even primi- 
tive languages provide this facility. The purpose of a well designed data provider 
should be to make access by an application natural, simple, and fast. 

Although, as a data provider, ADAMS was not as fast as we had desired, it 
was capable of responding in acceptable real time. But, to us, efficiency was never 
of paramont importance. For example, relational database systems [6], when they 
were introduced, were very much slower than the hierarchical [1] and network 
based [2] systems they replaced. The dominance of SQL and the relational model 
arose from the fact that such a wide variety of data configurations could be 
embraced by such a simple formal model, and then accessed by a syntactically 
simple language. To provide a unified model and a relatively simple language 
that enables database systems to become multifaceted data providers is still a 
worthwhile quest. 
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Abstract. This paper concentrates on the issues related to implementation of 
interoperability between distributed subsystems, particularly in the context of re- 
engineering and integration of several centralized legacy systems. Currently, most 
interoperability techniques require the data or services to be tightly coupled to a 
particular server. Furthermore, as most programmers are trained in designing stand- 
alone application, developing distributed system proves to be time-consuming and 
difficult. Here, we addressed those concerns by creating an interface wrapper model 
that allows developers to treat distributed objects as local objects. A tool that 
automatically generates the features of Java interface wrapper from a specification 
language called the Prototyping System Description Language has been developed 
based on the model. 



1 Introduction 

Interoperability between software systems is the ability to exchange services from 
one system to another. In order to exchange services, commands and data are relayed 
from the requesters to the service providers. Current business and military systems 
are typically 2-tier or 3-tier systems involving clients and servers, each running on 
different machines in the same or different locations. Current approaches for n-tier 
systems have no standardization of protocol, data representation, invocation 
techniques etc. Other problems related to interoperability are the implementation of 
distributed systems and the use of services from heterogeneous operating 
environments. These include issues concerning sharing of information amongst 
various operating systems, and the necessity for evolution of standards for using data 
of various types, sizes and byte ordering, in order to make them suitable for 
interoperation. These problems make interoperable applications difficult to construct 
and manage. 



1.1 Current State-of-the-Art Solutions 

Presently, the solutions attempting to address these interoperability problems range 
from low-level sockets and messaging techniques to more sophisticated middleware 
technology like object resource brokers (CORBA, DCOM). Middleware technology 
uses higher abstraction than messaging, and can simplify the construction of 
interoperable applications. It provides a bridge between the service provider and 
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requester by providing standardized mechanisms that handle communication, data 
exchange and type marshalling. The implementation details of the middleware are 
generally not important to developers building the systems. Instead, developers are 
concerned with service interface details. This form of information hiding enhances 
system maintainability by encapsulating the communication mechanisms and 
providing stable interface services for the developers. However, developers still need 
to perform significant work to incorporate the middleware’s services into their 
systems. Furthermore, they must have a good knowledge of how to deploy the 
middleware services to fully exploit the features provided. 

Current middleware approaches have another major limitation in design - the data and 
services are tightly coupled to the servers. Any attempt to parallelize or distribute a 
computation across several machines therefore encounters complicated issues due to 
this tight control of the server process on the data. Tuning performance by 
redistributing processes and data over different hardware configurations requires 
much more effort for software adjustment than system administrators would like. 



1.2 Motivation 

Distributed data structures provide an entirely different paradigm. Here, data is no 
longer coupled to any particular process. Methods and services that work on the data 
are also uncoupled from any particular process. Processes can now work on different 
pieces of data at the same time. Until recently, building distributed data structures 
together with their requisite interfaces has proved to be more daunting than other 
conventional interoperability middleware techniques. The arrival of JavaSpace has 
changed the scenario to some extent. It allows easy creation and access to distributed 
objects. However, issues concerning data getting lost in the network, duplicated data 
items, out-dated data, external exception handling and handshaking communication 
between the data owner and data users are still open. Developers have to devise ways 
to solve those problems and standardize them between applications. 



1.3 Proposal 

The situation concerning interoperability would greatly improve if a developer 
working on some particular application could treat distributed objects as local objects 
within the application. The developers could then modify the distributed object as if it 
is local within the process. The changes may, however, still need to be reflected in 
other applications using that distributed object without creating any problems related 
to inconsistency. The current research aims at attaining this objective by creating a 
model of an interface wrapper that can be used for a variety of distributed objects. In 
addition, we seek models that can automate the process of generating the interface 
wrapper directly from the interface specification of the requirement, thereby greatly 
improving developers’ productivity. 

A tool, named the Automated Interface Codes Generator (AICG), has been developed 
to generate the interface wrapper codes for interoperability, from a specification 
language called the Prototype System Description Language (PSDL) [9]. The tool 
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uses the principles of distributed data structure and JavaSpace Technology to 
encapsulate transaction control, synchronization, and notification together with 
lifetime control to provide an environment that treats distributed objects as if there 
were local within the concerned applications. 



2 Review of Previous Works 

A basic idea for enhancing interoperability is to make the network transparent to the 
application developers. Previous approaches [1] include 1) Building blocks for 
interoperability, 2) Architectures for unified, systematic interoperability and 3) 
Packaging for encapsulating interoperability services. These approaches have been 
assessed and summerized using Kiviat graphs by Berzins [1] with various weight 
factors. The Kiviat graphs give a good summary of the strong and weak points of 
various approaches. ORBs and Jini are currently among the promising technologies 
for interoperability. 



2.1 ORB Approaches 

There are however, some concerns with the ORB models. Sullivan [13] provides a 
more in-depth analysis of the DCOM model, highlighting the architecture conflicts 
between Dynamic Interface Negotiation (how a process queries a COM interface and 
its services) and Aggregation (component composition mechanism). Interface 
negotiation does not function properly within the aggregated boundaries. This 
problem arises because interacting components share an interface. An interface is 
shared if the constructor or Queryinterface functions of several components can return 
a pointer to it. Queryinterface rules state that a holder of a shared interface should be 
able to obtain interfaces of all types appearing on both the inner and outer 
components. However, an aggregator can refuse to provide interfaces of some types 
appearing on an inner component by hiding the inner component. Thus, 
Queryinterface can fail to work properly with respect to delegation to the inner 
interface. 

Hence, for the ORB approaches, detailed understanding of the techniques is required 
to design a truly reliable interoperable system. Programmers however, are trained 
mostly on standalone programming techniques. Adding specialized network 
programming models increases the learning as well as development time, with 
occasional slippage of target deadlines. Furthermore, bugs in distributed programs are 
harder to detect and consequences of failure are more catastrophic. An abnormal 
program can cause other programs to go astray in a connected distributed environment 

[9], [12]. 



2.2 Prototyping 

The demand for large, high quality systems has increased to the point where a 
quantum change in software technology is needed [9]. Requirements and 
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specification errors are a major cause of faults in complex systems. Rapid 
prototyping is one of the most promising solutions to this problem. Completely 
automated generation of prototype from a very high-level language is feasible and 
generation of skeleton programming structures is currently common in the computer 
world. One major advantage of the automatic generation of codes is that it frees the 
developers from the implementation details by executing specifications via reusable 
components [9]. 

In this perspective, an integrated software development environment, named 
Computer Aided Prototyping System (CAPS) has been developed at the Naval 
Postgraduate School, for rapid prototyping of hard real-time embedded software 
systems, such as missile guidance systems, space shuttle avionics systems, software 
controllers for a variety of consumer appliances and military Command, Control, 
Communication and Intelligence (C3I) systems [11]. Rapid prototyping uses rapidly 
constructed prototypes to help both the developers and their customers visualize the 
proposed system and assess its properties in an iterative process. The heart of CAPS is 
the Prototyping System Description Language (PSDL). It serves as an executable 
prototyping language at a specification and software architecture level and has 
special features for real-time system design. Building on the success of computer 
aided rapid prototyping system (CAPS) [11], the AICG model also uses PSDL for 
specification of distributed systems and automates the generation of interface codes 
with the objective of making the network transparent from the developer’s point of 
view. 



2.3 Transaction Handling 

Building a networked application is entirely different from building a stand-alone 
system in the sense that many additional issues need to be addressed for smooth 
functioning of a networked application. The networked systems are also susceptible to 
partial failures of computation, which can leave the system in an inconsistent state. 

Proper transaction handling is essential to control and maintain concurrency and 
consistency within the system. Yang has examined the limitation of hard- wiring 
concurrency control into either the client or the server. He found that the scalability 
and flexibility of these configurations is greatly limited. Hence, he presented a 
middleware approach: an external transaction server, which carries out the 
concurrency control policies in the process of obtaining the data. Advantages of this 
approach are 1) transaction server can be easily tailored to apply the desired 
concurrency control policies of specific client applications. 2) The approach does not 
require any changes to the servers or clients in order to support the standard 
transaction model. 3) Coordination among the clients that share data but have 
different concurrency control policies is possible if all of the clients use the same 
transaction server. 

The AICG model uses the same approach, by using an external transaction manager 
such as the one provided by SUN in the JINI model. All transactions used by the 
clients and servers are created and overseen by the manager. 
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3 The AICG Interaction Model 

The AICG model encapsulates some of the features of JavaSpace and Jini to provide 
a simplified ways of developing distributed applications. 



3.1 Jini Model 

The Jini model is designed to make a service on a network available to anyone who 
can reach it, and to do so in a type-safe and robust way [4]. The ability of Jini model 
is based on five key concepts: (1) Discovery is the process used to find communities 
on the network and join with them. (2) Lookup governs how the code that is needed to 
use a particular services finds its way into participants that want to use that service. 
(3)Leasing is the technique that provides the Jini self recovering ability. (4) Remote 
events allow services to notify each other of changes to their state (5) Transactions 
ensure that computations of several services and their host always remain in “safe” 
state. 

The Jini model was designed by Sun Microsystems with simplicity, reliability and 
scalability as the focus. Its vision is that Jini-enable devices such as PDA, cell phone 
or a printer, when plugged into a TCP/IP network, should be able to automatically 
detect and collaborate with other Jini-enabled devices. 

The powerful features of Jini provide a good groundwork for developing 
interoperability systems. However, the lack of automation for creating interface 
software and the need for developers to fully understand the Jini Model before they 
can use it created the same problems for developers as other interoperability 
approaches. 



3.2 The JavaSpace Model 

The JavaSpace model is a high-level coordination tool for gluing processes together in 
a distributed environment. It departs from conventional distribution techniques using 
message passing between processes or invoking methods on remote objects. The 
technology provides a fundamentally different programming model that views an 
application as a collection of processes cooperating via the flow of freshly copied 
objects into and out of one or more spaces. This space-based model of distributed 
computing or distributed structure has its roots in the Linda coordination language [3] 
developed by Dr. David Gelernter at Yale University. 

3.2.1 Distributed Data Structure and Loosely Coupled Programming 

Conceptually a distributed data structure is one that can be accessed and manipulated 
by multiple processes at the same time without regard for which machine is executing 
those processes. In most distributed computing models, distributed data structures are 
hard to achieve. Message passing and remote method invocation systems provide a 
good example of the difficulty. Most of the systems tend to keep data structure behind 
one central manager process, and processes that want to perform work on the data 
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Structure must “wait in line” to ask the manager process to access or alter a piece of 
data on their behalf. Attempts to parallelize or distribute a computation across more 
than one machine face bottlenecks since data are tightly coupled by the one manager 
process. True concurrent access is rarely achievable. 

Distributed data structures provide an entirely different approach where we uncouple 
the data from any particular process. Instead of hiding data structure behind a 
manager process, we represent data structures as collections of objects that can be 
independently and concurrently accessed and altered by remote processes. Distributed 
data structures allow processes to work on the data without having to wait in line if 
there are no serialization issues. 

3.2.2 Space 

A space is a shared, network-accessible repository for objects. Processes use the 
repository as a persistent object storage and exchange mechanism Processes perform 
simple operations to write new objects into space, take objects from space, or read 
(make a copy of) objects in a space. When taking or reading objects, processes use a 
simple value-matching lookup to find the objects that matter to them. If a matching 
object is not found immediately, then a process can wait until one arrives. Unlike 
conventional object stores, processes do not modify objects in the space or invoke 
their methods directly. To modify an object, a process must explicitly remove it, 
update it, and reinsert it into the space. During the period of updating, other processes 
requesting for the object will wait until the process writes the object back to the space. 
This protocol for modification ensures synchronization, as there can be no way for 
more than one process to modify an object at the same time. However, it is possible 
for many processes to read the same object at the same time. 

Key Features of JavaSpace: 

• Spaces are persistent: Spaces provide reliable storage for objects. Once stored in 
the space, an object will remain there until a process explicitly removes it. 

• Spaces are transactionally secure: The Space technology provides a transaction 
model that ensures that an operation on a space is atomic. Transactions are 
supported for single operations on a single space, as well as multiple operations 
over one or more spaces. 

• Spaces allow exchange of executable content: While in the space, objects are just 
passive data, however, when we read or take an object from a space, a local copy 
of the object is created. Like any other local object, we can modify its public fields 
as well as invoke its methods. 



3.3 The AICG Approach 

The AICG approach to interoperability has two parts. The first part is to develop a 
model to completely hide the interoperability from the developers and the second part 
of the approach is to design a tool that automates the process of integrating the AICG 
model into the distributed application so as to aid the development process. 
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3.3.1 The AICG Model 

The AICG model is built on JavaSpace and Jini. It is designed to wrap around data 
structures or objects that are shared between concurrent applications across a network. 
The model gives the applications complete access to the contents of the objects as 
though they were the sole owners of the data. Synchronization, transaction and error 
handling are built into the model, freeing the developers to concentrate on the actual 
requirement of the applications. 

AICG uses the JavaSpace Distributed Data Structure principles as the main 
communication channel for exchange of services. The model also encompasses Jini 
services like Transaction, Leasing and Remote Event. However, the difference is that 
the model wraps the services provided by the JavaSpace and Jini and hide their usage 
from the application. Developers are not required to understand the underlying 
principles before they can use the model. They should however be aware of object 
oriented programming constraints such as no direct access to the attributes of an 
object is allowed without going through the object methods. 

The most common use of the AICG model is to encapsulate objects that are to be 
shared. This form of abstraction has an advantage over direct use of a JavaSpace. The 
JavaSpace distributed protocol for modification ensures synchronization by enforcing 
that a process wishing to modify the object has to physically remove it from the space, 
alter it and write it back to the space. There can be no way for more than one process 
to modify an object at the same time. However, this does not prevent other processes 
from overwriting the updated data. For example, in an ordinary JavaSpace, the 
programmer of Process A could specify a “read” operation, followed by a “write” 
operation. This would result in 2 copies of the object in the Space. The AICG model 
prevents this since the 3 basic commands are embedded into distributed objects that 
are automatically generated to conform to the proper protocol. All modifications on 
the object are automatically translated to “take”, followed by “write” and all 
operations that access the fields of the distributed object are translated to “read”. 
These ensure that local data are up-to-date and serialization is maintained. 

Although the basic idea of the AICG model is simple, it requires many supporting 
features to make it work. Distributed objects may be lost if a process removes them 
from the space and subsequently crashes or is cut off from the network. Similarly, the 
system may enter a deadlock state if processes request more than one distributed 
object while, at the same time, holding on to distributed objects required by other 
processes. Similarly, latency and performance are very different between local access 
and remote access. Those issues should not be ignored in any interoperability 
techniques, if the systems to be built using the techniques must be robust. ORB 
techniques such as RPC and CORBA do not even consider performance and latency 
as part of their programming model, they treat it as a “hidden” implementation detail 
that programmer must implicitly be aware of and deal with while they preach that 
accessing remote object is similar to accessing local object. 

The AICG model has a set of four supporting modules to handle those situations. 
These modules provide transaction handling and user-defined latency to ensure 
integrity of the updates, exception handling for reporting errors and failures without 
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crashing the system, a notification channel to inform the application of certain events, 
and lease control for freeing up unused object during “house keeping”. The supporting 
features are discussed in section 5. 

3.3.2 The AICG Tool 

The second part of the research aims at developing a tool that generates software 
wrapper realizing the AICG model to aid the construction of distributed applications. 
The tool is designed to generate interface wrappers for data structures or objects that 
need to be shared, and is particularly useful for applications that can be modeled as 
flows of objects through one or more servers. The tool allows the developers to use all 
the features in the AICG model without the need to write complicated codes. This 
enhances interoperability by making network and concurrent issues transparent to the 
application developers. 

The interface wrappers are generated from an extension of a prototype description 
language called Prototyping System Description Language (PSDL). The extended 
Description language (PSDL-ext) expands property definitions that are specific only 
to AICG model. 

Some of the salient features of the AICG model generated by the tool are: 

• Distributed objects are treated as local objects within the application process. The 
application code need not depend on how the object is distributed, since the local 
object copy is always synchronous with the distributed copy. 

• Synchronization with various applications is automatically handled. Since the 
AICG model is based on the space transaction secure model and all operations are 
atomic. Deadlock is prevented automatically within the interface and each object 
has through transaction control. Any type of object can be shared as long as the 
object is serializable. Any data structure and object can be distributed as long as it 
obeys and implements the java serializable feature. 

• Every distributed object has a lifetime. The distributed object lifetime is a period 
of time guaranteed by the AICG model for storage and distribution of the object. 
The time can be set by developer. 

• All write operations are transaction secure by default. AICG transactions are based 
on the Atomicity, Consistency, Isolation, and Durability (ACID) features. 

• Clients can be informed of changes to the distributed object through the AICG 
event model. A client application can subscribe for change notification, and when 
the distributed object is modified, a separate thread is spawned to execute a 
callback method defined by the developer. 

• The wrapper codes are generated from high-level descriptive languages; hence, 
they are more manageable and more maintainable. 



4 Types of Services 

Services can be basic raw data, messages, remote method invocation, complex data 
structures, or object with attributes and methods. The AICG model is suited for 
exchange and sharing of complex data structures and objects. It can be tailored for 
raw data, messaging, and remote method invocation types of communication. 
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The AICG model uses the space as a transmission medium and hence loosens the tie 
between producers and consumers of services which are forced to interact indirectly 
through a space. This is a significant difference, as loosely coupled systems tend to 
be more flexible and robust. 



4.1 Overview of the PSDL Interface 

Prototype System Description Language (PSDL) provides a data flow notation 
augmented by application-orientated timing and control constraints to describe a 
system as a hierarchy of networks of processing units communicating via data streams 
[1]. Data Streams carry values of abstract types and provide error-free communication 
channels. PSDL can be presented in a semi-graphical form for easy specifying of the 
specifications and requirements. An introduction to the real-time aspects of the PSDL 
can be found in [1] and [2]. 

In PSDL, every computational entity such as an application, a procedure, a method or 
a distributed system is represented as an operator. It is hierarchical in nature and each 
operator can be decomposed to sub-operators and streams. Every operator is a state 
machine. Its internal states are modeled by variable sets local only to this operator. 
Operators are represented as circular icons in PSDL Graph, and triggered by data 
stream or periodic timing constraints. When an operator is triggered, it reads one data 
value from each input stream and computes the results if the execution guard or 
constraint is satisfied. The results are placed on the output streams if the output guard 
is satisfied. 

Operators communicate via data streams. These data streams contain values that are 
instances of an abstract data type. For each stream, there are zero or more operators 
that write data on the stream and zero or more operators that read data from that 
stream. There are two kinds of streams in PSDL, dataflow and sampled streams. 
Dataflow streams act as FIFO buffers, where the data values cannot be lost or 
replicated. These streams are used to synchronize data from multiple sources. 
Consumers of dataflow streams never read an empty stream. Similarly, each value in a 
stream is read only once. The control constraint used by the PSDL to distinguish a 
stream as dataflow is “TRIGGERED BY ALL”. 

Sampled Streams act as atomic memory cells providing continuous data. Connected 
operators can write on or read from the streams at uncoordinated rates. Older data are 
lost if the producer is faster than the consumer. Absence of “TRIGGERED BY ALL” 
control constraint implies the stream is sampled. 

If any of the streams have any initial value, then it is known as State Stream. State 
Streams are declared in specification of the parent operator and are represented by 
thicker lines in the PSDL graph. State streams correspond to spaces that contain 
objects intended to be updated. 

The mapping of dataflow streams or sampled streams into space-based 
communication is accomplished by treating the services, which in this case are the 
communication streams as objects to be shared. 
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4.2 Benefit of Loosely Coupled Communication 

In tightly coupled systems, the communication process needs the answers to the 
questions of “who” to send to, “where” the receiving parties are located, and “when” 
the messages need to be sent. The “who” is which processes, “where” is which 
machines, and “when” is right now or later. They must be specified explicitly in order 
for the message to be delivered. Hence, in a distributed environment, in order for a 
producer and consumer to communicate successfully, they must know each other’s 
identity and location, and must be running at the same time. This tight coupling leads 
to inflexible applications that are not mobile and in particular difficult to build, debug 
and change. In loosely coupled systems the issues of “who?”, “where?” and “when?” 
are answered with “anyone”, “anywhere” and “anytime”. 

“Anyone”: Producers and consumers do not need to know each other’s identities, but 
can instead communicate anonymously. In the sampled stream mapping, the 
producers place a message entity into the space without knowing who will be reading 
the messages. Similarly, the consumers read the message entity from the space 
without concern with the identity of the producers. 

“Anywhere”: Producers and consumers can be located anywhere, as long as they have 
access to an agreed-upon space for exchanging messages. The producer does not need 
to know the consumer’s location. Conversely, the consumer picks up the message 
from the space using associative lookup, and has no need to be aware of the producer 
location. This is especially useful when the producers and the receivers roam from 
machine to machine, because the space-based programs do not need to change. 

“Anytime”: With space-based communication, producers and consumers are able to 
communicate even if they do not exist at the same time, because message entries 
persist in the space. This works well when the producers and the consumers operate 
asynchronously (Sampled Stream). This does not mean that synchronous 
communication would not work; the space is also an event driven repository and can 
trigger the consumers whenever new entities are created in the space. This 
decoupling in time is useful because it enables operators to be scheduled flexibly to 
accommodate real-time constraints. 



5 How AICG Unifies Localized and Distributed Systems 

The AICG model aims at bridging the differences between localized and distributed 
systems by simplifying the distributed model and encapsulating all the necessary 
elements of the distributed systems into the wrapper interfaces. 



5.1 Localized and Distributed Systems 

The major differences between localized and distributed systems concern the areas of 
latency, memory access, partial failure, and concurrency. Most of interoperability 
techniques try to hide the network and simplify the problems by stating that locations 
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of the software components do not affect the correctness of the computations, just the 
performance. These techniques concentrate on addressing the packing of data into 
portable forms, causing an invocation of a remote method somewhere on the network 
and so forth. However, latency, performance, partial failure and concurrency are 
some of the characteristics of distributed systems which also need serious attention. 

5.1.1 Latency and Memory Access 

The most obvious difference between accessing a local object and accessing a remote 
object has to do with the latency of the two calls. The difference between the two is 
currently between four and five orders of magnitude. In the AICG model vision of 
unified object where remote access is actually a three steps process, step one retrieves 
remote object from the space, step two executes the method of the remote object 
locally and lastly step three returns the object back to the space if it is modified. 
Developers must be aware of the latency and performance concerns. To ensure that 
the developers are aware of the issues, the AICG model requires the developers to 
specify the maximum latency period before an exception is raised. This forces the 
developers to consider the latency issues for the type of data and methods that are to 
be shared. 

Another fundamental difference between local and remote computing concerns access 
to memory, specifically in the use of pointers. Simply stated, pointers are valid only 
within the local address space. There are two solutions; either all the memory access 
must be controlled by the underlying system, or the developers must be aware of the 
different type of access, whether local or remote. 

Using the object-oriented paradigm to the fullest is a way of eliminating the boundary 
between the local and remote computing. However, it requires the developers to build 
an application that is entirely object-oriented. Such a unified model is difficult to 
enforce. The AICG solution to this issue is by enforcing the object-oriented paradigm 
only on distributed objects. The distributed object wrapper generated automatically 
forces all access to the actual shared object to go through the wrapper which is always 
a local object, eliminating direct reference to the actual object itself. This promotes 
and enforces the principle that “remote access and local access are exactly the same”. 

5.1.2 Partial Failure and Concurrency 

In case of local systems, failures are usually total, affecting all the components of the 
system working together in an application. In distributed systems; one subsystem can 
fail while other systems continue. Similarly, a failure of network link is 
indistinguishable from the failure of a system on the other end of the link. The system 
may still function with partial failure, if certain unimportant components have 
crashed. It is however difficult to detect partial failure since there is no common 
agent that is able to determine which systems have failed, and this may result in the 
entire system going into unstable states 

The AICG model uses the loosely-coupled paradigm, and component failure may 
have impact on the distributed system when the systems retrieve objects from the 
space and later crash before returning the objects back to space. The AICG model 
resolves this issue by enforcing update of distributed objects with transaction control 
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and allowing the developers to specify useful lifetime or lease for the object. When a 
lease has expired, the object would be automatically removed from the space. 

Distributed objects by their nature must handle concurrent access and invocations. 
Invocations are usually asynchronous and difficult to model in distributed systems. 
Usually most models leave the concurrency issues to the developers discretion during 
implementation. However, this should be an interface issue and not solely an 
implementation issue, since dealing with concurrency can take place only by passing 
information from one object to another through the agency of the interface. The 
AICG model handles concurrency by design since there is only one copy of 
distributed object at a time in the entire distributed system. Processes are made to wait 
if the shared objects are not available in the space. 



5.2 Transaction 

Transaction control must validate operations to ensure consistency of the data, 
particularly when there are consistency constraints that link the states of several 
objects. The AICG model implements the transaction feature with the Jini 
Transaction model and provide a simplified interface for the developers. 

5.2.1 Jini Transaction Model 

All transactions are overseen by a transaction manager. When a distributed 
application needs operations to occur in a transaction secure manner, the process asks 
the transaction manager to create a transaction. Once a transaction has been created, 
one or more processes can perform operations under the transaction. A transaction can 
complete in two ways. If a transaction commits successfully, then all operations 
performed under it are complete. However, if problems arise, then the transaction is 
aborted and none of the operations occur. These semantics are provided by a two- 
phase commit protocol that is performed by the transaction manager as it interacts 
with the transaction participants. 

5.2.2 AICG Transaction Model 

AICG model encapsulates and manages the transaction procedures. All operations on 
a distributed object can be either with transaction control or without. Transaction 
control operations are controlled with a default lease of six sec. This default value of 
leasing time may, however, be overridden by the user. This is kept by the transaction 
manager as a leased resource, and if a lease expires before the operation committed, 
the transaction manager aborts the transaction. 

The AICG model by default, enables all transactions for write operations with a 
transaction lease time of six seconds. The developer can modify the lease time 
through the PSDL SPACE transactiontime property. 

All the read operations in the AICG model do not have transactions enabled by 
default. However, the user can enable it by using the property transactiontime with the 
upper limit in transaction time for the read operation. 
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5.3 Object Life Time (Leases/Timeout) 

Leasing provides a methodology for controlling the life span of the distributed objects 
in the AICG space. This allows resources to be freed after a fixed period. This model 
is beneficial in the distributed environment, where partial failure can cause holders of 
resources to fail thereby disconnecting them from the resources before they can 
explicitly free them. In the absence of a leasing model, resource usage could grow 
without bound. 

There are other constructive ways to harness the benefit of the leasing model besides 
using it as a garbage collector. For example. In a real-time system, the value of the 
information regarding some distributed objects becomes useless after certain 
deadlines. Accessing obsolete information can be more damaging in this case. By 
setting the lease on the distributed object, the AICG model automatically removes the 
object once the lease expires or the deadline is reached. 

Java Spaces allocate resources that are tied to leases. When a distributed object is 
written into a space, it is granted a lease that specifies a period for which the space 
guarantees its storage. The holder of the lease may renew or cancel the lease before it 
expires. If the leaseholder does neither, the lease simply expires, and the space 
removes the entry from its store. 

Generally, a distributed object that is not a part of a transaction lasts forever as long as 
the space exists, even if the leaseholder (the process that creates the object) has died. 
This configuration is enabled by setting the SPACE lease property in the 
Implementation to 0. 

In real-time environment, a distributed object lasts for a fixed duration of x ms 
specified by the object designer. To keep the object alive, a write operation must be 
performed on the object before the lease expires. This configuration is set through the 
SPACE lease property in the Implementation to the time in ms required. 

If an object has a lifetime, it must be renewed before it expires. In the AICG model, 
renewal is achieved by calling any method that modifies the object. If no modification 
is required, the developer can provide a dummy method with the spacemode set to 
“write”. Invoking that method will automatically renew the lease. 



5.4 AICG Event Notification 

In a loosely-coupled distributed environment, it is desirable for an application to react 
to changes or arrival of newly distributed objects instead of “busy waiting” for it 
through polling. AICG provides this feature by introducing a callback mechanism that 
invokes user-defined methods when certain conditions are met. 

Java provides a simple but powerful event model based on event sources, event 
listeners and event objects. An event source is any object that “fires” an event, usually 
based on some internal state change in the object. In this case, writing an object into 
space would generate an event. An event listener is an object that listens for events 
fired by an event source. Typically, an event source provides a method whereby 
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listeners can request to be added to a list of listeners. Whenever an event source fires 
an event, it notifies each of its registered listeners by calling a method on the listener 
object and passing it an event object. 

Within a Java Virtual machine (JVM), an application is guaranteed that it will not 
miss an event fired from within. Distributed events on the other hand, had to travel 
either from one JVM to another JVM within a machine or between machines 
networked together. Events traveling from one JVM to another may be lost in transit, 
or may never reach their event listener. Likewise, an event may reach its listener more 
than once. 

Space-based distributed events are built on top of the Jini Distributed Event model, 
and the AICG event model further extends it. When using the AICG event model, the 
space is an event source that fires events when entries are written into the space 
matching a certain template an application is interested in. When the event fires, the 
space sends a remote event object to the listener. The event listener codes are found in 
one of the generated AICG interface wrapper files. Upon receiving an event, the 
listener would spawn a new thread to process the event and invoke the application 
callback method. This allows the application codes to be executed without involving 
the developer in the process of event-management. 

The distributed objects must have the SPACE properties for Notification set to yes. 
One of the application classes must implement (java term for inherit) the notify AICG 
abstract class. The notifyAICG class has only one method, which is the callback 
method. The user class must override this method with the codes that need to be 
executed when an event fires. 



6 Developing Distributed Application with the AICG Tool 

This section describes the steps for developing distributed applications using the 
AICG model. An example of a C4ISR application is introduced in section 6.2 to aid 
the explanation of the process. 

6.1 Development Process 

The developer starts the development process by defining shared objects using the 
Prototyping System Description Language (PSDL). The PSDL is processed through a 
code generator (PSDLtoSpace) to produce a set of interface wrapper codes. The 
interface wrappers contain the necessary codes for interaction between application 
and the space without the need for the developers to be concerned with the writing 
and removing of objects in the space. The developers can treat shared or distributed 
objects as local objects, where synchronization and distribution are automatically 
handled by the interface codes. 
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6.2 Input Definition to the Code Generator 

The following example demonstrates the development of one of the many distributed 
objects in a C4ISR system. Airplane positions picked up from sensors are processed 
to produce track objects. These objects are distributed over a large network and used 
by several clients’ stations for displaying the positions of planes. Each track or plane 
is identified by track number. The tracks are ‘owned’ by a group of track servers, and 
only the track servers can update the track positions and its attributes. The clients only 
have read access on the track data. PSDL codes define (1) track object and as well as 
(2) Track_list object with the corresponding methods. AICG has used an extended 
version of the original PSDL grammar to model the interactions between applications 
in an entire distributed system. 

The track PSDL starts with the definition of a type called track. It has only one 
identification field tracknumber. Of course, the track objects can have more than 
one field, but only one field is used in this case to uniquely identify any particular 
track object. The type trackjist on the other hand, does not need an identification 
field since there is only one trackjist object in the whole system. Trackjist is used 
to keep a list of the tracknumbers of all the active tracks in the system at each 
moment in time. 

All the operators (methods) of the type are defined immediately after the specification. 
Each method has a list of input and output parameters that define the arguments of the 
method. The most important portion in the method declaration is the implementation. 
The developer must be able to define the type of operation the method supposed to 
perform. The operation types are constructor (used to initialize the class), read (no 
modification to any field in the class) and write (modification is done to one or more 
fields in the class). These are necessary, as the code generated will encapsulate the 
synchronization of the distributed objects. 

The other field in the implementation portion of the method, is transactiontime, 
transactiontime defines the upper limit in milliseconds within which the operation 
must be completed. 

Upon running the example through the generator tool, a set of Java interface wrapper 
files are produced. Developers can ignore most of the generated files except the 
following: 

• Track.java: this file contains the skeleton of the fields and the methods of the 
track class. The user is supposed to fill the body of the methods. 

• TrackExtClient.java: this is the wrapper class that the client initializes and uses 
instead of the track class. 

• TrackExtServer.java: this is the wrapper class that the server initializes and uses 
instead of the track class. 

• Notify AICG.java : this class must be extended or implemented by the application 
if event-notification and call-back are needed. 
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The methods found in the trackExtClient and trackExtServer have the same method 
names and signatures as the track class. In fact, the track class methods are called 
within trackExtClient or trackExtServer. 



7 AICG Wrapper Design 

This section explains the design of the AICG and the codes that are generated from 
psdl2java program. 



7.1 AICG Wrapper Architecture 

The AICG wrapper codes generated consists of four main module types. They are the 
Interface modules, the Event modules. Transaction modules and the Exception 
module. The interface modules implement the distributed object methods and 
communicate directly with the application. In reference to the example in section 6.2, 
the interface modules are entry AICG, track, trackExt, trackExtClient, trackExtServer. 
Instead of creating the actual object (track), the application should instantiate the 
corresponding interface object, either the trackExtClient or trackExtServer. Event 
modules (eventAICGID, eventAICGHandler, notifyAICG) handle external events 
generated from the JavaSpace that are of interest to the application. Transaction 
modules (transactionAICG, transactionManagerAICG) support the interface module 
with transaction services. Lastly, the exception module (exceptionAICG) defines the 
possible types of exceptions that can be raised and need to be captured by the 
application. 

Each time the application instantiates a track class by creating a new trackExtServer, 
the following events take place in the Interface: 

1. An Entry object is created together with the track object by the trackExtServer. 
The tack object is placed into the Entry object and stored in the space. 

2. Transaction Manager is enabled. 

3. The reference pointer to trackExtServer is returned to the application. 

Each time a method (getID, getCallsign, getPosition) that does not modify the 
contents of the object is invoked, the following events take place in the Interface: 

1 . The application invokes the method through the Interface 
(trackExtServer/trackExtClient). 

2. The Interface performs a Space “get” operation to update the local copy. 

3. The method is then executed on the updated copy of the object to return the value 
back to the application. 

Each time a method (setCallsign, setPosition), which does modify the contents of the 
object is invoked, the following events take place in the Interface: 

1 . The application invokes the method through the Interface. 

2. The interface performs a Space “take” operation, which retrieves the object from 
the space. 

3. The actual object method is then invoked to perform the modification. 
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4. Upon completion of the modification, the object is returned to the space by the 
interface using a “write” operation. 



7.2 Interface Modules 

The interface modules consist of the following modules; an entry (entry AICG) that is 
stored in space, the actual object (trackExt) that is shared and the object wrapper 
(trackExt, trackExtClient, trackExtServe.). 

7.2.1 Entry 

A space stores entries. An entry is a collection of typed objects that implements the 
Entry interface. The Entry interface is empty; it has no methods that have to be 
implemented. Empty interfaces are often referred to as “marker” interfaces because 
they are used to mark a class as suitable for some role. That is exactly what the Entry 
interface is used for, to mark a class appropriate for use within a space. 

All entries in the AICG extend from this base class. It has one main public attribute, 
an identifier and an abstract method that returns the object. Any type of object can be 
stored in the entry. The only limitation is that the object must be serializable. The 
serializable property allows the java virtual machine to pass the entire object by value 
instead of by reference 

All Entry attributes are declared as publicly accessible. Although it is not typical of 
fields to be defined as public in object-oriented programming style, an associative 
lookup is the way the space-based programs locate entries in the space. To locate an 
object in space, a template is specified that matches the contents of the fields. By 
declaring entry fields public, it allows the space to compare and locate the object. 
AICG encourages object-oriented programming style by encapsulating the actual data 
object into the entry. The object attributes can then be declared as private and made 
accessible only through clearly defined public methods of the object. 

7.2.2 Serialization 

Each distributed interface object is a local object that acts as a proxy to the remote 
space object. It is not a reference to a remote object but instead a connection passes all 
operations and value through the proxy to the remote space. All the objects must be 
serializable in order to meet this objective. The Serializable interface is “marker” 
interface that contains no methods and serves only to mark a class as appropriate for 
serialization. Classes marked as serializable should not contain pointers in their 
representation. 

7.2.3 The Actual Object 

We now look at the actual objects that are shared between servers and clients. The 
psdl2java generates a skeleton version of the actual class with the method names and 
its arguments. The bodies of the methods and its fields need to be filled by the 
developers. 
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7.2.4 Object Wrapper 

Wrapping is an approach to protecting legacy software systems and commercial off- 
the-shelf (COTS) software products that require no modification of those products [1]. 
It consists of two parts, an adapter that provides some additional functionality for an 
application program at key external interfaces, and an encapsulation mechanism that 
binds the adapter to the application and protects the combined components [1]. 

In this context, the software being protected contains the actual distributed objects, 
and the AICG model has no way of knowing the behaviors of the distributed objects 
other than the operation types of of the methods. The adapter intercepts all 
invocations to provide additional functionalities such as synchronization between the 
local and distributed object, transaction control, event monitoring and exception 
handling. The encapsulation mechanism has been explained in the earlier section 
(AICG Architecture). Instead of instantiation of the actual object, the respective 
interface wrapper is instantiated. Instantiating the interface wrapper indirectly 
instantiates the actual object as well as storing the object in the space. 

Three classes are generated for every distributed object. There are named with the 
object name appended with the following Ext, ExtClient, and ExtServer. 



7.3 Event Modules 

The event modules consist of the event callback template (notifyAICG), the event 
handler (eventAICGHandler ) and the event identification object (eventAICGID). 

7.3.1 Event Identification Object 

The event identification object is used to distinguish one event from others. When an 
event of interest is registered, an event identification object is created to store the 
identification and event source. The object has only two methods, an ‘equals’ method 
that checks if two event identification objects are the same and a ‘to string’ method 
which is used by the event handler for retrieving the right event objects from the hash 
table. 

7.3.2 Event Handler 

Event Handler is the main body of the event operation in the AICG model. It handles 
registration of new events, deletion of old events, listening for event and invoking the 
right callback for that event. Inside the event handler are in fact, three inner classes to 
perform the above functions. Events are stored in a hash table with the event 
identification object as the key to the hash table. This allows fast retrieval of the event 
object and the callback methods. 

The event handler listens for new events from the space or other sources. When an 
object is written to the space, an event is created by the space and captured by the all 
the listeners. The event handler would immediately spawn a new thread and check 
whether the event is of interest to the application. 
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7.3.3 The Callback Template 

The callback template is a simple interface class with an abstract method 
listenerAICGEvents. Its main function is to allow the AlCG model to invoke the 
application program when certain events of interest is “fired”. As explained earlier, 
the notify AICG interface needs to be implemented by each application that wishes to 
have notification. 



7.4 The Transaction Modules 

The transaction modules consist of a transaction interface (transactionAICG) and the 
transaction factory (transactionManagerAICG). 

The transaction interface is a group of static methods that are used for obtaining 
references to the transaction manager server somewhere on the network. It uses the 
Java RMI registry or the look-up server to locate the transaction server. 

The transaction factory uses the transaction interface to obtain the reference to the 
server, which is then used to create the default transaction or user-defined 
transactions. In short the transaction factory can perform the following: 

1 . Invoke the transaction interface to obtain a transaction manager. 

2. Create a default transaction with lease time of 6 seconds. 

3. Create a transaction with a user defined lease time. 

7.5 The Exception Module 

The exception module defines all the exception codes that are returned to the 
application when certain unexpected conditions occur in the AICG model. The 
exceptions include 

• "UnDefinedExceptionCode"; unknown error occur. 

• "SystemExceptionCode"; system level exceptions, such disk failure, network 
failure. 

• "ObjectNotEoundException"; the space does not contain the object. 

• "TransactionException"; transaction server not found, transaction expired 
before commit. 

• "LeaseExpiredException"; object lease has expired. 

• "CommunicationException"; space communication errors. 

• "UnusableObjectException"; object corrupted. 

• "ObjectExistsException"; there another object with the same key in the space. 

• "NotificationException"; events notification errors. 



8 Conclusion 

The AICG vision of distributed object-oriented computing is an environment in 
which, from the developer’s point of view, there is no distinct difference between 
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sharing of objects within an address space and objects that are on different machines. 
The model takes care of underlying interoperability issues by taking into account 
network latency, partial failure and concurrency. Automating the generation of 
interface wrappers directly from the Prototype System Description Language further 
enhances the reliability of the systems by enforcing proper object-oriented 
programming styles on the shared objects. Usage of PSDL for specification of shared 
objects also results in increased efficiency and shorter development time. 
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Abstract. This paper reports a work in progress in the design of secure 
GIS systems. We need such designs when sensitive GIS data needs to 
be communicated over public networks. The ubiquity of public networks 
can lead to compromising sensitivity of data during data exchanges in 
response to a query. Gryptographic methods are designed independent 
of the relative importance of data, or its semantic content. The paper 
argues in favour of using spatial characteristics of GIS data to obtain 
an additional layer of security provided by using morphic manipulation 
of spatial objects. We discuss two families of morphic manipulation al- 
gorithms. While the first one achieves benign morphic alterations, the 
second family of algorithms offers enhanced morphic alterations with 
very extensive data-hiding potential. The paper limits its scope to two 
dimensional map data. 



1 Introduction 

GIS and cryptology are old fields. There is enough historical evidence to suggest 
practice of secrecy and data-hiding in communication of maps. To that extent the 
two fields had grown together in past. Newer developments in technology have 
influenced both cryptology and GIS. Today all data is digital, and therefore, 
bytes for a cryptographer. To him only text (read bytes) based methods need 
exploration for any security considerations. Map data, on the other hand, is also 
bytes with a difference. It usually represents space with embedded objects that 
have varying degrees of importance and sensitivity. 

Typically GIS data may be catagorised as spatial and aspatial. The basic 
data is spatial though. The aspatial data is tied to the core spatial data. For in- 
stance, a point (spatial attribute) on a map may have a name (aspatial attribute) 
associated with it to denote that it is a mountain peak. A region may be a poly- 
gon that actually denotes a rice growing area. Two dimensional map data can 
be defined in terms of points, lines and polygons with their associated semantics 
and aspatial features such as mounts, rivers and wetlands or lakes. The degree of 
importance and sensitivity associated with these objects may vary considerably. 
For instance, a data item like location of a strategic observation post or location 
of a line of sight communication antenna may be a point. A chartered course of 

S. Bhalla (Ed.): DNIS 2000, LNCS 1966, pp. 65-79, 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 




66 



P.C.P. Bhatt 



an ocean liner with sensitive cargo, or route of a convoy over a certain terrain 
may be a set of lines and this information may have to be guarded for a certain 
period of time. Similarly, the lay out of a national laboratory spread over an area 
with points for pick-up and delivery may be very important. Such information 
not only may need to be specified, but also exchanged between operational teams 
observing a certain level of security. Security algorithms generally do not depend 
on the logical nature of data or its importance. A GIS user, however, may need 
to provide for security in the context of its use (by a user or a program) . Such is 
the nature of GIS data. Therefore, the responsibility for the extent of, and the 
nature of security (including data-hiding) , must lie with GIS community. Glearly, 
these means have to be beyond byte based security algorithms and must be de- 
fined using GIS data characteristics. We argue that since the primary character 
of GIS data is spatial, so data-hiding algorithms too must be spatial algorithms. 
This paper is exploratory in nature as it reports a research effort in its embryonic 
stage. 

The paper is organised as follows : In section 2 we examine data structur- 
ing and encoding. We argue that rasterised, as well as, vector forms of data 
representation can be regarded as special cases of a flexible information stor- 
age scheme. In section 3 we discuss a typical distributed GIS architecture. We 
also indicate where, and how, one may provide security in communicating GIS 
data to a user, or a process, in response to a query. We very briefly discuss the 
scenario when data may have been stored in multiple sites. In section 4 we dis- 
cuss some algorithms which provide morphic alterations to support data-hiding. 
Such a move ensures security beyond the mandatory secure communication from 
service providers or cryptographic packages. The last section concludes recapit- 
ulating and reviewing the relevance of data-hiding for GIS. 



2 The Nature of Spatial Data 

Accurate spatial data is difficult to obtain and model [8]. Today much of the 
data acquisition employs satellite imagery which is often a rasterised collection 
of bits. From this rasterised data often a workable model is derived and used 
for specific application. Most practical applications only need two dimensional 
maps [1] . For two dimensional maps points, lines and polygons seem to capture 
all the needed GIS entities [1,4]. Typically points, lines and polygons represent 
objects like mounts, rivers or rail-roads and provinces or counties respectively. 

Spatial data is generated using surveying and geometry too. This is almost 
always two dimensional data and is most useful for a detailed land lay out, 
land use and planning. Such data too appears in the form of points, lines and 
polygons. For instance, in the context of a electrical utility management, map 
data may capture location of a transformer (or even a house hold), a IIKV 
line or a region serviced by a sub-station. Note that such data may as well 
be described using a vector form of data representation. So, primarily we are 
concerned with the capture of point, lines and polygons with their associated 
semantic attributes that have a quantitative, and even qualitative interpretations 
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in a map. The end user’s semantic interpretation of data drives all considerations 
in data representation and structuring. 

2.1 Data Representation and Encoding 

Even though the rasterised or vector form of GIS data is a consideration, there 
is one other major consideration that impacts the organisation of a GIS data 
base. And that is : what sort of clustering of data ought to be there within a 
disk so that response to queries can be expedited. To ensure fast response to 
queries, the map data which bears relationships, such as proximity on a map, 
must have similar close enough data proximity within a disk. The consequence 
of this is that the encoding of data becomes very critical. 

In short, there are two following major factors that govern GIS design. 

— Vectorised or Rasterised data. 

— Glustering of related data. 

The vector form is a direct representation. The rasterised representation, on the 
other hand, stipulates some inferencing in identifying objects. Piwowar and other 
authors [1,2,4] not only compare and contrast the two but also offer methods and 
tools for inter-change between various formats of data representation. They also 
discuss the quality and the efficiency from several perspectives. For now we shall 
assume the availability of a flexible representation of spatial objects as shown in 
fig.l which is suitable for both rasterised as well as vector form of description 
of spatial objects in two dimensions. With this versatile data-structure we can 
encapsulate spatial, aspatial and other graphical attributes (features) as well 
as all relationships to other objects. Glustering has often been prompted by the 
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Fig. 1. The Data Representation 



consideration that much of the GIS data is stored on disks. One needs to optimise 
disk access for data retrieval in response to queries. Search is usually optimised 
for following kinds of spatial queries. 
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— Spatial point query : requiring retrieval of point co-ordinates. 

— Spatial range query : requiring retrieval of polygons and performing set theo- 
retic operations like complementation, union or intersection between regions. 

— Aspatial point attributes : requiring retrieval of name of a city, its postal 
zone etc. 

— Graphical range attributes : requiring retrieval of polygons e.g. snow bound 
area. 

— Relational queries: requiring retrieval in relation to a specified spatial or 
aspatial attribute. 

So, some times the clustering may be based on aspatial or graphical attributes 
while on other occasions it may be based on spatial or geographical attributes. 
The GIS community has variously debated organising data primarily with the 
following in mind. 

— Efficiency in responding to a class of queries; 

— Efficiency in retrieval of data often based on proximity; 

— Organisation of data based on relationships between objects; 

A query may be to retrieve a set of spatial attributes which possibly covers a 
range. This generally requires forming a cluster based on aspatial or graphical 
attributes. The graphical attribute based cluster retrieval can be facilitated by 
a recursive data encoding that refines a region by dividing it at each level of 
recursion in each of the two dimensions [3]. Other well known Quad-tree based 
spatial data organisations are described in [5,6]. 

Yet another spatial data organisation may be in the form of a field-tree. 
A field-tree employs a hierarchy of varying grid sizes at multiple levels with 
different origins at each levels. Thus the grids have an inherent displacement 
between levels of hierarchy. The displacements ensure that every object in the 
map fits into a grid at one of the levels in the hierarchy [6] [7]. In GIS the spatial 
attributes such as location and relative position are of prime importance. These 
often influence the way GIS data get encoded [I]. The data encoding can also be 
achieved by employing a linear scan. In fact one group used such a scan for two 
dimensional GIS data and succeeded in combining the benefits of both quad-tree 
and field-tree within one encoding scheme called SLG (spatial location code) [6] . 
We next discuss their scheme. 



The Spatial Location Code (SLC). Though SLG [6] was designed for a 
relational data-base organisation with considerations of efficiency in responding 
to range queries, its designers made an effort to obtain unique encoding for each 
object. To achieve this it employs Morton-codes that relate to quad-tree ranges 
within the hierarchy of field-trees. This scheme requires encoding at multiple 
levels to ultimately achieve unique encoding for every two dimensional object 
what ever its shape or refinement. The authors of SLG are well justified in their 
view that a one level quad-tree can result in retrieval of a bulk of useless objects 
if the object being retrieved is thin and long. The main point to be noted is 
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that SLC at best grows logarithmically. That is good news. One does not need 
much imagination to see how SLC would adapt to the representation we have 
chosen in fig. 1. Cryptographic methods would apply to these data as well as 
to any other form of representation. However, with SLC cryptographic methods 
can additionally be applied at object level as each object has a distinct SLC 
encoding and that is important. 

Object Orientation. There are many consequences of employing object based 
GIS data representation. Primary amongst these relate to attributes, methods 
and relationships. For instance, attributes have names and types some of which 
may be inherited from a super-class. Methods are defined for specific operations 
with a set of acceptable arguments. They define results including those that may 
raise an exception. The relationships capture kinships amongst GIS objects. For 
example, objects that share a boundary or nearest object to north-east etc. bear 
some form of kinship with each other. 

The Assumptions. We assume SLC encoding for objects. The basic scheme 
shown in fig. 1 has the flexibility to represent data in either vector or rasterised 
form. From the security point of view, SLC encoding is suitable for encryption 
of data. Object orientation, on the other hand raises other considerations like 
access rights to both state and methods associated with GIS objects. A simple 
read only access may suffice to respond to a simple existential query. However, 
for certain kinds of queries some inferencing (based on data attributes) may be 
needed. This may require access to methods. Further, the client objects may 
need to be authenticated for access. One may use final object class declaration 
like in Java. This prevents a hacker from subclassing it into look-alike classes. 

The notions above define the scope of our current concerns. As we shall see 
later in section 3, these concerns help us define encryption protocols to support 
security based access control regime. In fact as an owner, or as a user, we can 
embed the security at object level before transmission over public network whose 
ubiquity make it vulnerable in the first place. 

3 Security in Distributed GIS 

In this section we first discuss a generic distributed GIS architecture and then 
show security related enhancements for DOGIS [4,12]. architecture. We note that 
the security scenarios range from a simple user access to a spatial data to a case 
where results are synthesized by merging responses from multiple sites. While 
it is customary to use secure and authenticated user access, it helps to have 
data-hiding capability for sensitive GIS data in a distributed GIS. Basically we 
try to determine where, and how, we may provide for data-hiding. 

3.1 Distributed GIS 

A distributed GIS may be organised in several ways. For instance, it may be 
the case that each aspect or feature of a map is separately depicted on copies 
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Fig. 2. A Sample Architecture : DOGIS 



of the same map on different machines. Such is the case usually for multiple 
agency operations [1]. These agencies operate in CSCW mode or may need to 
share and exchange their data. Also, it is possible that a large GIS is split along 
some boundaries like northern, southern, central etc. [4] and there is a separate 
site for each such segment of the GIS. In the latter case, all features of GIS 
corresponding to its segment may be available on a site. Nevertheless, what is 
important is that a distributed GIS has its data stored on multiple machines 
with presumably each machine at a different site. In any architecture supporting 
a distributed GIS, it is stipulated that queries may be raised from any site. 

We show a generic architecture in the top part of fig. 2 where the following 
is stipulated. 

— A distributed GIS is a collection of several part GISs each on a separate site. 

— Each site has a local GIS. 

— Each site has a query server for local queries. 

— Each part also has an interface to raise global queries. These may need access 
to data residing on some other site(s). 
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— Each site incorporates or has an access to an analysis service. This service 
can analyse a query to determine and identify both local and non-local parts 
of the query. 

— Each site can recognise trusted clients, i.e. which are the other sites or au- 
thenticated users or processes. 

— Each site supports a communication service which can send or receive mes- 
sages for the following. 

• Queries or subqueries. 

• Objects and object based services including methods. 

• Present a synthesised response by merging partial results received from 
various sites. 

DOGIS [4] is a distributed GIS architecture satisfies the above requirements. 
The enhanced DOGIS architecture [12] (fig. 2) includes the following services. 

— Gommunication services : These include atomic and selective broadcast and 
synchronisation services similar to ISIS [13]. 

— Query services : The query server contacts meta knowledge base sever to 
determine which object base servers may need to be contacted to respond 
to the current query. The responses from object bases are collated by query 
server. 

— Object base services : The object base services include the checks required 
for object access privileges to data as well as methods. 

— Fault tolerance services : These include data and process resilience, consis- 
tency under failure and continued down-graded services under failure. 

— Security services : To offer cryptographic checks and data-hiding mechanisms 
(at the gateway). 

With this security enhancement we describe the following protocol to service 
a distributed query. 

— Step 1: The gateway receives the query from an external user or a client 
process. 

— Step 2: The gateway authenticates the user or process. 

— Step 3: The gateway communicates the query to query server. 

— Step 4: The query server (which is now a client) checks on object access 
rights of the user with other servers. 

— Step 5: The query server assembles the response and sends it to the gateway 

— Step 6: The gateway performs data-hiding operations before transmission on 
public network. 



3.2 Security Considerations 

To begin with the simplest case is when a user attempts to access some data. 
Glearly, the system must perform an authentication check on users before permit- 
ting data access. The minimal authentication may be by employing a pass- word 
check. However, more sophisticated ways to authenticate would require either 
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result of a one way function [9] that the verifier (system holding the data) is ca- 
pable of computing and the prover (the user) is capable of supplying. One way 
functions are interesting from the point of view that one can not infer the input 
from output. So even if a wire tap occurs, the authentication will pass only for 
a genuine user. Also, there are ways that can have authentication spread over 
multiple sessions or a fixed number of sessions only. When a user or a process 
needs to communicate some data two cases may arise. 

— When they have access to a shared data-base. 

— When users or processes do not share data. 

We discuss the issues in the context of data encoding such as SLC. In the first 
case the two users (or processes)need to exchange a session key and the sender 
communicates only the SLC codes for the identified object. Any of the well 
known session key selection procedures can be followed to establish a session. 
The receiver can himself access the object from the data-base upon receiving the 
SLC code of the object. Note that we are transmitting only text in this case. 

In the second case when spatial data is exchanged during a session with a 
secure session key, the security may be immensely enhanced by employing datar- 
hiding. The receiver would have to unravel the data after decryption due to 
two levels of security. The data-hiding is particularly important because certain 
seemingly harmless combination of read only permissions can lead to offering un- 
intended data, often inferred using a set of statistical and range queries. In such 
cases an additional level of data-hiding can be very useful. The main point being 
made is that gateways or query servers at local sites in distributed CIS must 
employ steganography before sending any response on sensitive data. This has 
the advantage that a masquerader is presented with data that is in the very least 
morphologically altered [10]. Only a genuine user may know the transformation 
or retrieval algorithm to decode. We shall clearly require that the gateway has 
this additional capability. This is important at the gateway particularly when 
the data is collated from distributed sites. 

The gateway either receives raw data or all the SLCs for objects. In the latter 
case it may generate the data for the transformation. For the present discussion 
we would assume that it is raw data that needs the transformations. 

4 Algorithms for Morphic Alterations for Data Hiding 

In this section we outline morphic manipulation algorithms. In particular, we 
define a generic framework for a family of algorithms. With such a family one 
can choose to invoke an appropriate level of complexity depending upon the 
security requirements. 

4.1 Minimaly Secure Method: Family MOX 

First, we note that we are dealing primarily with two dimensional spatial data. 
Such data may have been generated in response to a subquery. For illustration 
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of the algorithm we shall take it that the data lies within a minimal bounding 
rectangle which spreads across {X,Y) relative to (0,0). We identify this region 
simply as {X,Y). Objects within the region {X,Y) shall be identified as Oi 
where i denotes an index on objects in the region (X, Y). Let us mark the region 
(X,Y) with a (n * m) grid. Further, we think of a sequence of grids along X 
axis as a horizontal strip. Similarly, along Y axis we can identify vertical strips. 
In fig. 3 grid numbers 1 through 6 form a horizontal strip while grid numbers 6, 
12 and 18 form a vertical strip. To achieve a morphological alteration we shall 




The original map 



The transformed map 



The morphological alterations are minimal and the object order is completely preserved 



Fig. 3. Application of MOX Algorithm 



perform a sequence of random resizing of the grids. First we resize all horizontal 
strips and next all vertical strips. All grids along a strip undergo same random 
percentage change along orthogonal direction i.e. a horizontal strip changes in 
height and a vertical strip changes in width. The percentage changes along X 
(width) or Y (height) directions are obtained by using a uniformly distributed 
random number. Briefly we examine the effects of random resizing operation. 

— Relative grid positions : The relative grid positions remain the same, i.e. 
all grids map with the same neighbourhood relationships as they previously 
had. The grid size changes with each side multiplied by a different random 
number. 

— An empty grid (fig. 3) : The transformed grid remains empty. Now new 
object is added into it or no old objects move into it. 

— A point in a grid (fig. 3) : The points within a grid are relocated in coordinate 
values. The new values are the transformed values due to resizing of the 
grid. The change in the coordinate values (within the grid) is in the same 
proportion as the sides of the grid. 

— A line spanning two separated grids (fig. 3) : The transformed line still 

spans the corresponding new grids with the two end points relocated in 
their respective grids and joined by a line. Note that non-intersecting lines 
remain non-intersecting. This follows from the fact that the point order in 
both X and Y coordinate values remains unchanged in every grid. So, if 
there is a point above(below) or to the left(right) of a line then it retains 
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that relationship with the line after the transformation over all the grids the 
lines span. Thus the set of points in a line h retain their relationships with 
those on another line l 2 - This also explains how polygons get mapped. 

Next we describe our first algorithm MOO. 

Procedure MOO 

begin 

Authenticate user (process); /* a cryptographic check */ 

Generate and exchange session key; 

Randomly resize the (n,m) grid over the region (X,Y) ; 

Use a new uniform grid and get a SLC code for all OiS ; 

Transmit the SLC codes or objects as required; 

Terminate the session. 

end 

An Appraisal of MOO. Algorithm MOO has following properties: 

— Number of Objects : PI : It preserves the number of objects. This is 

true as no new objects (points, lines and polygons) are generated or added 
during random resizing of the grids. 

— Relative point positions P2 : It preserves the relative position of points. 
We consider the two cases that arise. 

• Points are within a grid : As x{y) coordinates of each of these points 
is scaled equally, so all relative x{y) orderings in a grid are maintained. 
Therefore, the translations resulting from random resizing do not alter 
relative positions of points within a grid. 

• Points are from different grids : The horizontal strip order as well 
as vertical strip orders remain unchanged keeping corresponding grid 
order to north, south, east or west. Therefore, the points preserve their 
relative positions. 

— Object order : P3 : The transformation is object order preserving. This 
follows from the fact that all MBRS ( minimal bounding rectangles ) of all 
the objects retain their relative order in A or in U direction. 

— Shape preservation : P4 : The transformation is not shape preserving. 
Proof : Consider three consecutive points Pi,P 2 ,Ps on any polygon with 
at most two points sharing a grid. The preservation of convexity (concavity) 
[10] amongst these three cannot be guaranteed on transformation. Consider 
the angles subtended with any of the two axes at one of the extreme points 
( a point with highest x or y coordinates or for that matter (0,0) ). For 
simplicity let us choose (0,0) and angles with X axis. Due to independent 
extent of relative translation of the points in different grids the changes in 
the angles subtended at (0, 0) with X axis shall be in general unpredictable. 
Therefore, the convexity (concavity) of three consecutive points cannot be 
guaranteed. We shall reckon a shape difference resulting from transformation 
of convexity to concavity (or the other way around) as an indentation. So, 
an n-gon remains an n-gon with possibly some indentations introduced. 
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— Number of indentations : P5 : In a ./V sided polygon no more than 

N — 2 indentations can he introduced using MOO. This follows from the fact 
that we need 3 consecutive points to get one indentation. So we may have 
at most l + (n — 3)=n — 2 indentations generated for a closed polygon. 

— Inverse operations : P6 : It is possible to undo the morphological changes 
and retrieve the points of the original map. This follows from the fact that 
the multiplication operation has an inverse for non-zero values in reals. This 
is an important consideration from security point of view for a symmetric 
key operation. 

As an algorithm, MOO brings about a morphological change that alters the 
coordinate values but preserves most of the map semantics of the original map. 
To that extent it compromises security as it provides little, if any, data-hiding. 
At best it can be said to add some confusion with minimal data-hiding. 

To keep our sense of a family of algorithms, we propose a minor extension to 
MOO and make it into algorithm MOl by simply taking more points along a line 
on every polygonal object. Further, we can enhance data-hiding considerably by 
adding fictitious objects to the existing map. As a simple extension we may add 
objects not obscuring any of the existing objects. Yet another class may introduce 
obscuring objects. Notwithstanding the enhanced confusion due to introduction 
of a few additional indentations, the main drawback of object order preservation 
leads to benign morphic alterations which in turn limits the utility of MOX family 
of algorithms. 

4.2 Secure Methods: Family MIX 

The family of methods MOX are basically insecure because the basic map data 
semantics are preserved. This is due to the preservation of object order over a 
region. To go beyond object order preservation we will design another family 
of algorithms. We will use a space scanning step. The scan may be use any 
of the well known methods like Cantor-diagonal, Row-prime or Spiral order or 
the n-curve [8]. We will, however, use a Simple_scan described below. We will 
assume that the our region of interest is bounded by boundaries identifiable as 
Left-most, Right-most or Bottom-most and Top-most. Also, during the scan we 
shall mark the following categories of points as points of interest to form a point 
chain. 

— All point objects; 

— Grid points aligned with object points along the Y boundaries; 

— All points in line or polygonal objects; 

— Points of intersection of scan line with any of the objects. 

Now we describe our Simple _scan (fig. 4). 
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Note the morphological alterations that can be obtained when we preserve only X order. Generation of such a figure may need 



scaling to get to the same size as original. Also it is noteworthy that we obtain intersections on boundaries. 

Fig. 4. Application of MIX Algorithm 

Procedure Simple.scan 

begin 

Step 1 : From (0, 0) scan along Left-most border till Top-most border is 
reached marking points of interest enroute. 

Step 2 : Turn right to scan along the Top-most border till we reach the left 
most point object of interest within the region (a map point 
or a point on a line or polygon object) in the region (X,Y) 

Let us call this point p. 

Step 3 : Scan vertically down wards covering point p till we reach 
Bottom- most border. 

Step 4 : Turn left along the Bottom-most border till we align with the next 
left most point which is to the immediate right of p. 

Step 5 : Turn left again and continue the scan vertically marking all 
the points of interest till the Top-most border is reached. 

Step 6 : Repeat the steps 2 through 5 till Right-most border has been 
scanned. This completes the marking all the points of 
interest in the region {X, Y). 

end 

We now describe algorithm MIO using the same data as in fig. 3. In this 
description we shall skip mentioning steps for authentication and key exchange 
as these shall be assumed. 

Procedure MIO 

begin 

Step 1 : Use Simple_scan scan region {X, Y) marking all the points 
of interest. 

Step 2 : Randomise the distance between each successive point in the scan. 

Step 3 : Using the new distances obtained in the above step redraw 
the map. 
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Step 4 : Scale this map to the required size. 

Step 5 : SLC encode this map for transmission. 

end 

In fig. 4 we have identified 54 points of interest marked in sequence from 0 
to 53. Randomised displacement between consecutive points in the point chain 
is note worthy. 

An Appraisal of MIO. Algorithm MIO has many desirable properties. 

— Point objects : Our Simple_scan proceeds from left to right so point order in 
X direction is preserved. However, a point above or below within a grid (or 
outside) may not map with the same property. In other words the V ordering 
amongst points may be altered beyond recognition. This also means that 
non-intersecting lines in the original map may get transformed to intersecting 
lines. 

— A line object : If the scan intersects a line at m intermediate points then it 
may introduce up to m shape alterations within the line. 

— Shape preservation : The consequence of alterations in point and line objects 
means no convex (concave) object can be expected to preserve its shape. A 
polygon may have nearly as many shape alterations as are the points of 
intersection with the scan line. Also, lines far apart in a polygon may end 
up intersecting as seen in fig. 4. 

— Object order : With at best X direction order preservation we cannot guar- 
antee object order preservation in the V direction. 

The last point made above is crucial. The X order preservation must be 
eliminated for a truly secure system. In fig. 5 we show how our map could be 
split in to eight segments and each segment could be scanned. In fig. 5 the scan 
would be in X for segments 1, 3, 5 and 7 and it will be in V direction for segments 
2, 4, 6 and 8. The scan is such that basically it can form a closed curve. Such a 
scan will preserve X or Y order amongst the points within the same segment but 




1. The symbols ‘X’ and ‘Y’ denote nature 
of the scan being undertaken. 

2. The nos. 1 through 8 denote the sequence 
followed in scanning the map. 



(X, Y) 




(0, 0) dotted line correspond to the scan exactly as shown 

in the figure on the left. The nos. 1-70 cover the map 



Fig. 5. Application of MIX in X and Y directions 
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we can not be certain about the order between the points across the segments. 
This is important. Other extensions of this family can be done the following 
ways : 

— Add objects : We can enhance data-hiding considerably by adding objects 
both of obscuring and non-obscuring kind to the map. 

— Object based encoding : We can delimit scans to MBRs of individual objects. 

— Image embedded map encoding : We can embed all the map pixels within a 
certain MBR in an image [11]. One may even resort to using one image for 
every MBR. 

5 Conclusions 

We have seen that besides a certain logical view of the data, technological im- 
peratives require that the data be represented and encoded in a certain way. 
The object based representation helps in security considerations including data- 
hiding. The two layered security using cryptography and steganography should 
provide a secure way to exchange sensitive data across public networks. The fam- 
ily of algorithms discussed here give ample opportunities to explore data-hiding 
options at different levels of security. 
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Abstract. Database schema integration is an important discipline for con- 
structing heterogeneous multidatabase systems. Fuzzy information have also 
been introduced into relational databases and have been extensively studied. 
However, the issues of integrating local fuzzy relations are rarely addressed. In 
this paper, we focus on schema integration of fuzzy multidatabase systems. We 
identify the conflicts involved in schema integration and provide a methodol- 
ogy for resolving these conflicts. 



1 Introduction 

The in-depth applications of computer and information technologies have put the 
requirement for sharing information resources among different sites. In the manufac- 
turing area, for example, typical applications of information sharing can be found in 
computer-integrated manufacturing (CIM), concurrent engineering (CE) and virtual 
enterprise (VE) based on global manufacturing. The evolvement of network and da- 
tabase technologies makes it possible to achieve information integration of multiple 
databases. 

The difference between the schemas of component databases is called schema het- 
erogeneity. There are many types of forms in schema heterogeneity. The difference of 
data model is a kind of schema heterogeneity. If, for example, one database is based 
on relational data model and the other is based on object-oriented data model, there 
may be heterogeneous problems in two component databases with the same data 
model. For two attributes belonging to two relations, respectively, for example, they 
are semantically related but have different names, different data types or different 
units of measure, or they are not semantically related but have same names. In addi- 
tion, when the schemas of component databases are not union compatible, there exist 
problems of missing data. The heterogeneity will produce the conflicts. Choosing a 
proper schema as the schema of target databases for all component databases and 
resolving the possible conflicts in component databases are two major tasks that must 
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be addressed at first in database integration. Target databases are produced at last 
with operations such as outerjoin or outemnion. Database integration has been a ma- 
jor research area in recent years. Many issues related to schema integration have been 
extensively studied. While some technical problems have been fully addressed, some 
others still remain unsolved. 

It should be noticed that, however, current researches of database integration 
mainly focus on integrating crisp component databases. Little has been reported on 
integrating fuzzy component databases. As we know, information is often vague or 
ambiguous in real world application. In order to represent and process such imperfect 
information, fuzzy information has been introduced into relational databases. Besides, 
modeling fuzzy information in object-oriented databases and conceptual data model 
such as ER, EER and IFO has also received increasing attention at present. Therefore, 
integration of fuzzy component databases is essentially a need for the applications of 
fuzzy databases and the development of integrated database systems [26]. In this 
paper, we identify the conflicts in fuzzy multidatabase systems and provide a meth- 
odology for resolving these conflicts. 

The remainder of the article is organized as follows. Section 2 presents some basic 
notions about database integration and about fuzzy relational databases. In Section 3, 
conflicts in integrating fuzzy component databases are investigated. Section 4 gives 
the methodologies to resolve conflicts and implement integration. Section 5 con- 
cludes this article and points out the future work. 



2 Background 

2.1 Schema Integration and Conflicts 

There are several approaches for implementing schema integration in heterogeneous 
multiple databases. The first approach is to merge individual schemas of component 
databases into a single global conceptual schema for all independent databases by 
integrating their schemas [2, 8, 20]. This approach requires complete integration, i.e., 
all local schemas are mapped to the global schema. The second approach is to adopt a 
so-called federated database system [12]. Being different from the first approach, 
there is no global schema for all component databases in federated database system 
and only a schema for describing data to be assessed by the application is created in 
the local databases, which is called “a partial schema”. This approach only requires a 
partial integration. Notice that the target databases based on global schema and fed- 
eral databases are physical databases. There are solid mapping among component 
databases and target databases. Because minor change of the component database can 
cause large variation of the target databases, it is difficult to maintain such mapping. 
In general, there are some restrictions to the above-mentioned component databases. 
The third approach is to dynamically create the target databases by providing users a 
multdatabase query language [7, 15, 16, 17]. Both the global and partial schemas are 
not needed with this approach and the target databases are essentially the databases 
based on view. In other words, they are logical databases or virtual databases. Being 
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different from the view in traditional relational database, the view relation here 
should resolve the possible conflicts. Because there are no restrictions on component 
databases with this approach, it is widely adopted for integrating heterogeneous mul- 
tidatabase systems. 

In database schema integration, a core problem is to identify the same real-world 
object from component databases and then resolve a large number of incompatibili- 
ties that exist in different component databases. It is a difficult and interesting issue to 
identify if tuples from component databases describe the same real-world object. If 
component relations have a common key, the component tuples with the same key 
values must describe the same real-world object. Otherwise, it is necessary to com- 
pare other attributes. In this paper, we assume that there exists such a key in compo- 
nent relations. 

Let r and s be component relations from different component databases and t, and 

be their tuples, called component tuples, respectively. If and describe the same 
real-world object, namely, they have the same attribute values on the common key, 
then t, and can be integrated to produce a single tuple t, called target tuple with 
outerjoin [4] or outerunion [18, 23] operation after resolving the conflicts. According 
to the semantic relationship between t, [Ai] and [Aj], four types of important con- 
flicts are generalized as follows [9, 23]: 

(a) Naming conflicts. This type of conflicts can be divided in two aspects. One is 
semantically related with data items being named differently and the other is semanti- 
cally unrelated with data items being named equivalently. 

(b) Data type conflicts. This case occurs when semantically related data items are 
represented in different data types. 

(c) Data scaling conflicts. This case occurs when semantically related data items 
are represented in different databases using different units of measure. 

(d) Missing data. This case occurs when the schemas of component databases have 
different attribute sets. 

The conflict of missing data can be resolved by using outerunion operation and 
null values appears in target tuples. For other conflicts, the mappings of attribute 
values from the attributes of component tuples to the virtual attributes [9] of target 
tuples are necessary. According the concrete conflicts, mappings one-to-one, many- 
to-one, and one-to-many can be identified. The naming conflicts and data type con- 
flicts can be resolved with one-to-one mapping. The data scaling conflicts can be 
resolved with either many-to-one mapping or one-to-many mapping, depending on 
the actual situation. For the first two mappings, the result is still an atomic value of 
virtual attribute. For the last mapping, however, the result is to produce a special 
value of virtual attribute, the partial value, in which exactly one of the values is a true 
value [9]. 



2.2 Fuzzy Set and Fuzzy Relational Databases 



Fuzzy relational databases have been extensively studied to enhance the capability of 
traditional relational databases for describing and processing information. Fuzzy data 
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is originally described as fuzzy set by Zadeh [24], Let U be a universe of discourse. A 
fuzzy value on U can be characterized by a fuzzy set f in U. A membership function 
U ^ [0,1] is defined for the fuzzy set F, where |Fp (u), for each u e U, denotes the 
degree of membership of u in the fuzzy set F. Thus, the fuzzy set F is described as 
follows: 



F={[i (Uj)/Uj, p (u,)/u„ ..., p, (uj/uj. ( 1 ) 

The above-mentioned membership function (u) can be interpreted as a measure 
of the possibility that a variable X has the value u, where X takes values in U. In this 
case, a fuzzy value can also be described by a possibility distribution Jt^ [25]. 

Here, Jt^. (u), e U, denotes the possibility that is true. Let Tt^. and F be the pos- 
sibility distribution representation and the fuzzy set representation for a fuzzy value, 
respectively. It is apparent that Jt^, = F is true. 

In connection to fuzzy data representation of possibility distribution, there exist 
two basic extended data models for fuzzy relational databases [3, 21, 22]. One of the 
data models is that attribute values represented possibility distributions. The other one 
is that tuples associated with possibilities whereas attribute values are crisp. Based on 
these two basic fuzzy relational models, there is an extended fuzzy relational model 
where possibility distribution and degree of membership arise in a relational database 
simultaneously. The form of an n-tuple in each of the above-mentioned fuzzy rela- 
tional model can be expressed, respectively, as 



II 

A 

> 






(2) 


t = <Gj, a^, ..., 


Op a d> and 


(3) 


II 

A 

> 






(4) 



where e Dj with Dj being the domain of attribute A, ri e (0, 1], 7t^ is the possibility 
distribution of attribute A. on its domain Dj, and (x), x e Dj, denotes the possibility 
that X is true. In this paper, we focus on the last type of fuzzy relational databases. 
Attribute pD is used in a fuzzy relational schema to indicate the degree memberships 
of tuples. 



3 Conflicts in Fuzzy Multidatabase Systems 

Since fuzzy relational databases exist in multiple relational databases and crisp rela- 
tional databases are essentially the special forms of fuzzy relational databases, there 
are new types of conflicts, which should be resolved in schema integration together 
with the conflicts identified above. In this section, we investigate the conflicts that 
may occur in the schema of fuzzy multidatabase systems. 

In the following discussion, let r and s be fuzzy component relations from different 
component databases and and be their tuples, called component tuples, respec- 
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lively. Let r and 5 have the common key. Assume that there is no any fuzzy value for 
the key and and have the same key values. 



3.1 Membership Degree Conflicts 

Membership degree conflicts occur at the level of tuples, which can be classified into 
two classes as follows: 

(a) Missing membership degree. Among t, and one is associated with an attribute 
of membership degree, i.e., the tuple is fuzzy, but another is not, i.e., the tuple is 
crisp. 

(b) Inconsistent membership degree. Although t, and t^ have degree memberships t, 
[pD] and t^ [pD], respectively, t^ [pD] t- 1^ [pD]. 

Consider the following three relations r,, r^ and r^. There is the conflict of missing 
membership degree between tuples in r^ and r^ as well as tuples in r, and r^. On the 
other hand, there is a conflict of inconsistent membership degree between tuples in r^ 
and r^. 



Table 1. Relation Table 2. Relation Table 3. Relation 



ID 


Name 


9540 


John 



ID 


Name 


pD 


9540 


John 


0.8 



ID 


Name 


pD 


9540 


John 


0.5 



3.2 Attribute Value Conflicts in Identical Attribute Domains 

In addition to the conflicts at the level of tuples, there may exist conflicts at the levels 
of attribute domains and attribute values. First, let us look at the attribute value con- 
flicts, where the attributes with conflicts have the same domains. 

Let Ai and Aj be attributes with the same domains in r and s, respectively, and t, 
[Ai] and t^ [Aj] are semantically related to each other. 

(a) Inconsistent crisp attribute values. The attributes t, [Ai] and t^ [Aj] are all crisp 
but t^ [Ai] 1. 1^ [Aj]. For example, the age of Tom is 25 in relation r but is 27 in rela- 
tion s. 

(b) Missing fuzzy attribute values. Among t, [Ai] and t^ [Aj], one is fuzzy based on 
possibility distribution while the other is crisp. For example, the age of Tom is 24 in 
relation r but is "about 25" in relation s. 

(c) Inconsistent fuzzy attribute values. The attributes t, [Ai] and t^ [Aj] are all fuzzy 
but t, [Ai] t^ [Aj]. For example, the age of Tom is "about 26" in relation r but is 
"about 28" in relation s. 
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3.3 Missing Attributes 

Missing attributes mean that r and s have different attribute sets. In other words, an 
attribute in a component relation is not semantically related to any attribute in another 
component relation. 

Consider an example that r is a relation on the schema {ID, Name, Age} and j is a 
relation on the schema (ID, Name, Major}. Attribute "Age" in r is a missing attribute 
of relation 5 and attribute "Major" in s' is a missing attribute of relation r. 



3.4 Attribute Name Conflicts 

Attribute name conflicts are the naming conflicts. Let Ai and Aj be attributes in r and 
s, respectively. This type of conflicts can be divided in two aspects. 

(a) Semantically related attributes are named differently, i.e., synonyms. 

(b) Semantically unrelated data items are named equivalently, i.e., homonyms. 

It should be noticed that one is not concerned with the conflicts of missing attrib- 
utes and attribute names if component relations are fuzzy. 



3.5 Attribute Domain Conflicts 

Data type conflict and data scaling conflict mentioned above are caused by inconsis- 
tent attribute domains. When there are fuzzy attribute values in component tuples, the 
attribute domain conflicts become more complicated. It is noticed that there is no 
attribute domain conflict in membership degree attributes. 

Let Ai and Aj be attributes with different domains in r and s, respectively, and 
[Ai] and [Aj] are semantically related to each other. 

(a) Data format conflicts. Although Ai and Aj have the same data type and data 
unit, they have different expressive formats. For example, q [Ai] and t^ [Aj] all repre- 
sent date, but t, [Ai] is in the form of “22/05/98” while t^ [Aj] is “05/22/98”. 

(b) Data unit conflicts. Attributes Ai and Aj have the same data type, but their 
units of measure are different. For example, t, [Ai] and [Aj] are all real data, but t^ 
[Ai] is “22.4 kilogram” while [Aj] is “22.9 pound”. 

(c) Data type conflicts. Attributes Ai and Aj have different data type. Therefore, 
we may have t, [Ai] = 22 and t^ = 21.9, which are integer and real, respectively. 

Since attribute domains have the above-mentioned conflicts, attribute values must 
have conflicts. Considering fuzziness of attribute values, we differentiate the follow- 
ing cases. 

Case 1: The attributes t, [Ai] and t^ [Aj] are all crisp. 

Case 2: Among t^ [Ai] and t^ [Aj], one is fuzzy based on possibility distribution 
whereas another is crisp. 

Case 3: The attributes t, [Ai] and t^ [Aj] are all fuzzy. 
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4 Conflicts Resolutions 

Among the above-mentioned conflicts, some of them, including missing attributes, 
attribute name conflicts, inconsistent crisp attribute values on identical attribute do- 
mains and inconsistent crisp attribute values on different attribute domains, have been 
investigated and resolved [9, 13]. In this section, we focus on some new types of 
conflicts in connection to fuzzy databases. 

Let r and s be fuzzy component relations from different component databases. Let 
t, and be component tuples belonging to r and s, respectively, and t, and have the 
same crisp key values, namely, they describe the identical object in the real world. 
Now, we integrate t, and to form a tuple t. It is clear that t has the same key and key 
values as t, (or t^). The other attribute values of t are formed after resolving the con- 
flicts between semantically related attribute values. Here, we assume that there is no 
attribute name conflicts in r and s because they can be resolved beforehand. 



4.1 Resolving Membership Degree Conflicts 

First, consider the situation of missing membership degree. Let and be tuples in r 
(K, X) and s (K, X, pD), respectively, where K stands for a key, X represents a set of 
common attribute, and pD is a membership degree attribute. Let t, [K] and [K] are 
crisp and t, [K] = [K]. Then t, and denote the same real-world object. Assume that 

t, [X] and [X] are crisp or fuzzy simultaneously. If t, [X] and [X] are fuzzy, then 

they must be equivalent to each other. It is clear that there is a conflict of missing 

membership degree between t, and t^. For tuple t formed by integrating t, and its 
schema is {K, X, pD), and t [K] = t, [K] = t, [K], t [X] = t, [X] = t, [X], and t [pD] = 
max (1, [pD]) = 1. 

Now let us focus on the situation of inconsistent membership degree. Let r and s be 
r (K, X, pD) and s (K, X, pD), respectively, where K, X and pD have the same 
meanings as above. Let t, [K] and [K] are crisp and t, [K] = [K]. Assume that t, 

[X] and [X] are crisp or fuzzy simultaneously. If t, [X] and [X] are fuzzy, they 

must be equivalent to each other. Assume (pD) (pD). It can be seen that t, and 
denote the same real-world object and there is the conflict of inconsistent membership 
degree between and t^. For tuple t, its schema is {K, X, pD], and t [K] = [K] = 

[K], t [X] = t, [X] = t, [X], and t [X] = max (t, [pD], t, [pD]). 



4.2 Resolving Attribute Value Conflicts in Identical Attribute Domains 

Let tj and be component tuples in r (K, X) and s (K, X), respectively, where K is 
key and X is a set of common attribute. In order to simplify the discussion, here, 
membership degree attributes are not considered. If they are included, the potential 
conflicts can be resolved by applying the above methods. Assume that t, [K] and 
[K] are crisp and t, [K] = [K] . At this moment, the schema of integrated target rela- 

tion is {K, X) and t [K] = t, [K] = t, [K]. Let A e X, then 
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(a) When t, [A] and [A] are crisp and t, [A] -t- [A], the conflict of inconsistent 

crisp attribute values occurs and t [A] = [t, [A], [A]], being a partial value [9]. Of 

course, if t, [A] = [A], t [A] = t, [A] = [A]. 

(b) When t, [A] and [A] are crisp and fuzzy, respectively, the conflict of missing 
fuzzy attribute values occurs. Assume that t, [A] is crisp and [A] is fuzzy. Then t 
[A] = t, [A]. 

(c) When both t, [A] and [A] are fuzzy and t, [A] [A] , the conflict of incon- 
sistent fuzzy attribute values occurs and t [A] = t, [A] [A], where “u/’ is a fuzzy 

union operation [24] . Of course, if t, [A] = [A] , t [A] = t, [A] = [A] . The adoption 
of the fuzzy union operation, but not the intersection operation or difference opera- 
tion, is to avoid information lost when performing integration. The union of two 
fuzzy values on the same universe of discourse U, say A and B with the possibility 
functions Ji^ and Jtj,, is still a fuzzy set on U with the possibility function U 

[0, 1], where 

Vm G U, 7t^u/B (m) = (^A (“)> (“))■ (5) 



4.3 Resolving Attribute Value Conflicts in Inconsistent Attribute Domains 

In order to resolve attribute value conflicts in inconsistent attribute domain, the con- 
flicts of attribute domains should be resolved firstly. For this purpose, the component 
relations are converted into other relations, called virtual component relations. The 
attributes in virtual component relations are called virtual attributes [9, 23]. Note that 
there are no attribute domain conflicts in virtual component relations because they 
have been resolved by mapping an attribute concerned with domain conflicts in an 
original component relation to the corresponding virtual attribute. It is clear that such 
mappings must also been done between a tuple in original component relation and the 
corresponding tuple in virtual component relation, called virtual tuple, or more pre- 
cisely between an attribute value and a value of the corresponding virtual attribute. 
Instead of integrating original component relations, their virtual component relations 
are integrated to form the target relation. 

According to different types of attribute domain conflicts, the above-mentioned 
mappings can be classified into one-to-one mapping, many-to-one mapping, or one- 
to-many mapping. The one-to-one mapping produces certain result for mapping one 
data item. Therefore, a crisp attribute value in original component relation is mapped 
into another crisp value of the corresponding virtual attribute. In addition, a fuzzy 
attribute value in original component relation is mapped into another fuzzy value of 
the corresponding virtual attribute. The difference between these two fuzzy values 
represented by possibility distributions is only their supports but they have the one-to- 
one relationships. A pair of values with one-to-one relationship has the same possi- 
bility. 

Let us look at an example. Let the original component tuple be t^ and the corre- 
sponding virtual component tuple be t’ . Let A be an attribute in the schema of t^ and 
A’ be the corresponding virtual attribute. They have domains of integer with units 
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“cm” and “mm”, respectively. Assume t, [A] = {0.5/10, 0.7/11, 1.0/12, 0.9/13, 0.8/14, 
0.6/15}. Utilizing one-to-one mapping, t, [A] is mapped into t’ [A'] = {0.5/100, 
0.7/110, 1.0/120, 0.9/130, 0.8/140, 0.6/150). 

The many-to-one mapping also produces certain results for mapping one data item. 
A crisp attribute value in original component relation is mapped into another crisp 
value of the corresponding virtual attribute. A fuzzy attribute value in original com- 
ponent relation is mapped into another fuzzy value of the corresponding virtual at- 
tribute. However, since there is a many-to-one mapping relationship, several elements 
of the support in many side are mapped into one element of the support in the one 
side, in which the final possibility of the same element should be the maximum one 
among all the possibilities of the element. 

Let us look at another example. Let t’ , A, and A’ be the same as the above. 
However, they have domains of real and integer, respectively. Assume that [A] = 
{0.4/10.4, 0.5/10.8, 0.6/11.2, 0.7/11.6, 0.8/12.0, 0.9/12.4, 1.0/12.8, 0.9/13.2, 0.8/13.6, 
0.7/14.0, 0.6/14.4, 0.5/14.8, 0.4/15.2). Utilizing many-to-one mapping, t, [A] is 
mapped into t/ [A'] = {0.4/10, 0.6/11, 0.9/12, 1.0/13, 0.8/14, 0.5/15). 

Data format conflicts, data type conflicts, and some data unit conflicts can be re- 
solved by utilizing one-to-one and many-to-one mappings. Since the virtual compo- 
nent relations should be integrated to form the target relation instead of the original 
component relations and they have no attribute domain conflicts, attribute value con- 
flicts are those in identical attribute domains. At this moment, we can use the methods 
discussed in Section 4.2 to resolve such conflicts. 

It should be noticed that some data unit conflicts can only be resolved by utilizing 
one-to-many mapping. The one-to-many mapping produces a list or a set of value for 
mapping one data item. A crisp attribute value in the original component relation is 
mapped into a partial value of the corresponding virtual attribute [9, 18, 23). A partial 
value can be regarded as a special case of fuzzy value, in which the possibility of 
each element is one. A fuzzy attribute value in original component relation is mapped 
into another fuzzy value of the corresponding virtual attribute. However, since there 
is a one-to-many mapping relationship, one element in the support of the former is 
mapped into several elements in the support of the later, where the possibility of each 
element should be the possibility of the original element. 

Let us look at an example. Let be the original component tuple and A be an at- 
tribute, denoted as transport forms, in the schema of t,. Assume that the domain of A 
is {Land, Air, Water) and [A] = {0.4/Air, 0.7/Land, 0.9/Water). If the domain of 
the virtual attribute A' which corresponds to A is {Train, Truck, Plane, Ship), the 
attribute tj [A] is then mapped into t,’ [A'] = {0.4/Plane, 0.7/Train, 0.7/Truck, 
0.9/Ship } by utilizing one-to-many mapping. 

Utilizing one-to-many mapping, some data unit conflicts can be resolved. There- 
fore, the virtual component relations to be integrated have no attribute domain con- 
flicts. At this moment, attribute value conflicts in the virtual component relations are 
again those in identical attribute domains, which can be resolved by using the meth- 
ods discussed in Section 4.2. 
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5 Conclusion 

The semantic conflicts in fuzzy multidatabase systems have been identified and the 
resolution methodologies have been investigated in the paper. These issues can serve 
as the foundations of constructing such multidatabase systems that have the ability to 
model and process both crisp information and imprecise and uncertain information. 

The integrity constraints, query, and update of fuzzy multidatabase systems are 
also interesting and we will address these issues in future work. In addition, compo- 
nent databases considered in this paper are only fuzzy relational databases. It is possi- 
ble that fuzzy relational databases and fuzzy object-oriented databases arise in multi- 
database systems simultaneously. This leads to another future work on the semantic 
conflicts and resolutions under such environment. 
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Abstract. Search engines are currently the standard medium for locat- 
ing and accessing information on the Web. However, they may not scale 
to match the anticipated explosion of Web content since they support 
only extremely coarse-grained queries and are based on centralized ar- 
chitectures. In this paper, we discuss how database technology can be 
successfully utilized to address the above problems. We also present the 
main features of a prototype Web database system called DIASPORA 
that we have developed and tested on our campus network. This system 
supports fine-grained querying and implements a distributed processing 
architecture. 



1 Introduction 

In 1998, there were about 320 million documents on the Web and this number 
grew to 800 million in 1999, comprising over 15 terabytes of information. During 
the same time, search engine coverage reduced from about 30% to about 15%. 
These statistics, reported in [13,14], clearly indicate that the Web is experiencing 
tremendous growth - in fact, the anticipation is that it will grow by 1000 percent 
in the next couple of years [15] - and that search engines are proving unequal 
to the challenge, covering less and less of the document space. The database 
growth also means that search engines will return more and more documents to 
the user for the same query, resulting in the “data deluge” problem. This largely 
arises because search engines support only extremely coarse-grained queries and 
do not allow users to express their full knowledge about the domain space and 
thereby restrict the number of answers returned for the query. Finally, since 
search engines typically implement a centralized indexing and query-processing 
architecture, they are inherently not suited for scalability in terms of handling 
large amounts of data or high volume of user requests. All in all, it appears that 
search engines will soon run out of steam as the mechanism of choice for locating 
and accessing Web data. 

In this paper, we investigate how database technology can be utilized to 
address the above data management and access problem. We consider various 
ways in which database technology and Web technology can be integrated and 
highlight the design challenges involved in a successful integration. We also sum- 
marize the main features of a system called DIASPORA, a new Web database 
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system that we have developed which supports fine-grained querying and im- 
plements a distributed processing architecture. A Java-based prototype of DI- 
ASPORA is currently operational and is undergoing field trials on our campus 
network. Initial performance results indicate significant improvements in terms 
of both the quality of answers to user queries as well as the resources required 
to generate these answers. 

The remainder of this paper is organized as follows: Background information 
on search engines and their limitations is given in Section 2. An overview of 
different approaches to combining Web and database technologies is made in 
Section 3. Our new DIASPORA system is described in Section 4. Finally, in 
Section 5, we present the conclusions of our study. 

2 Search Engines and Their Limitations 

Search engines are currently the primary mechanism for accessing Web-based 
information, implementing index servers that provide URL references to docu- 
ments. Users send query strings to these engines, which are applied against their 
indices to generate a ranked list of URLs of those documents that may have some 
related information. Well-known search engines include AltaVista, Excite, Ya- 
hoo, etc. While search-engines have contributed tremendously to the popularity 
of the Web as a publishing medium, they do suffer from a variety of limitations: 

1 . Each individual search engine covers only a small part of the Web, resulting 
in engine-specific answers to user queries. This forces users to query multiple 
search engines (each of which has its own data-entry format) in order to have 
reasonable coverage. 

2. The query predicates are extremely coarse, operating primarily at the level 
of keywords. This makes it difficult for users to express sophisticated queries 
as well as to utilize any domain knowledge that they may have to eliminate 
irrelevant answers. For example, a query of the form “Find the pages listing 
the faculty members from all the departments in Indian Institute of Science 
(IISc)” that involves both structural predicates (restricting the search to 
web-sites in IISc) and content predicates (faculty member pages) is not ex- 
pressible in search engine interfaces. 

3. Search engines are based on a centralized architecture where all user requests 
are fed to a small set of servers. This means that as the Web grows and the 
number of users increase, these engines will become the bottleneck for ef- 
ficient location and access of Web resources. In particular, the choice of a 
“data shipping” approach suffers from several disadvantages, similar to those 
already observed for traditional distributed database systems, including the 
transfer of large amounts of unnecessary data resulting in network conges- 
tion and poor bandwidth utilization, the client-site becoming a processing 
bottleneck, and extended user response times due to sequential processing. 

In the remainder of this paper, we investigate how the above-mentioned lim- 
itations of search engines can be addressed by suitably integrating database 
technology into the Web query-processing framework. 
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3 Interfacing the Web and Databases 

There are a variety of ways in which database technology and web technology 
have come together, which we have classified as “Database on Web” - Web is used 
as a communication medium; “Web on Database” - Web servers use database 
backends to host their content; and “Web is Database” - the Web forms the 
database, respectively. We discuss these frameworks in this section (for further 
material on database techniques for the Web refer [6]). 

3.1 Database on Web 

In this approach, the Web is used primarily as a communication medium for 
transporting queries and data between networked users and relational database 
servers. The main challenge here is the design of interfaces that facilitate em- 
bedding of SQL queries and their results into HTML and several such interfaces 
have been developed - for example, the WWW Connection interface for IBM’s 
DB2 system [19]. This is explained in more detail below. 

Organizations typically maintain their corporate information using database 
systems such as DB2 or Oracle. Using this backend database they provide differ- 
ent views of their organization to different groups of people (e.g. employees, cus- 
tomers and general public). These organizations exploit the low-cost and easily 
accessible feature of the Web. Web servers of such organizations are equipped 
with the Common Gateway Interface (CGI) techniques [23], enabling related 
users to access the corresponding view of the Database [19]. This is possible 
using HTML forms [20]. 

The mode of interaction is as follows: The Web server provides a starting 
HTML form to be filled and submitted by the user. This form is processed by 
executing the CGI-script at the server. The CGI-script produces another related 
HTML document, which may be a form again. This document corresponds to the 
view of the database/organization, valid as per the user identity/requirement. 
Communication in such a manner between the client’s browser and the web 
server lets an organization provide access to its database to related users from 
anywhere on the web. This has also led to the concept of closed Internet, also 
known as intranets, within the organization. 

The implementation of the Database on Web framework in DB2- WWW Con- 
nectzon [19] is shown in Fig. 1. 

3.2 Web on Database 

For larger organizations, having their web sites as a collection of HTML files 
is a major maintenance problem. They require to dynamically generate HTML 
documents, equip their sites with data/content query processing power, analyze 
their log records to improve and tune their performance, and organize their 
sites better. This serves as motivation to move from a standard file system to 
a database system [8,5]. The web server or HTTP daemon process is equipped 
with features similar to that of a database, to index HTML documents, in a 
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Fig. 1. Database on Web 



manner that is transparent to the HTTP requests. These databases are required 
to handle complexities such as heterogeneous data (html files, images, applets, 
associated CGI-scripts, etc.), their indexing, querying, caching, etc. 

Illustra’s Web Datablade System, which provides a toolset to maintain 
database-driven web-sites [8], is shown in Fig. 2. 



3.3 Web Is Database 

In this last approach, which we focus on for the rest of this paper, we view the 
Web itself as an enormous (and potentially the largest in the world) database 
of information. A pictorial representation is shown in Fig. 3. 

Porting classical database technology onto the Web is rendered difficult due 
to the heterogeneous, dynamic, hyper-linked and largely unstructured format of 
the Web and its contents. Further, the absence of a controlling entity equivalent 
to a database administrator makes it impossible to regulate the growth of the 
Web. In Section 4, we present a new Web database system that takes a first step 
towards providing an integrated and novel solution to these problems. 





The Web Is the Database 



95 





Fig. 2. Web on Database 



4 The DIASPORA System 

In designing a database system that addresses the above challenges, the pri- 
mary research issues that arise include the development of a data model that 
elegantly represents Web documents, a query language that enables users to eas- 
ily process information represented according to this data model, and a query 
processor that can efficiently execute these user queries. We highlight here the 
main features of DIASPORA (Distributed Answering System for Processing 
of Remote Agents), a new Web database system that attempts to provide an 
integrated and novel solution to the modeling, language and processing issues. 
The complete details of this system have been published in [21,22]. A Java-based 
prototype of DIASPORA has been implemented and tested on our campus net- 
work. Initial performance results indicate significant improvements in terms of 
both the quality of answers to user queries as well as the resources required to 
generate these answers. 
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Fig. 3. Web is Database 



4.1 Data Model 

DIASPORA implements a data model wherein each Web document is repre- 
sented as a rooted, directed and edge-labeled graph [4], called doc-graph. At each 
site, the graphs of all the individual documents hosted at the site are integrated 
to form another rooted, directed and edge-labelled graph, called site-graph. A set 
of simple heuristics are used to “wrap” the data in the base HTML documents 
to conform to this data model. 

Doc-graphs are intended to capture, in a hierarchical manner, the relation- 
ships between the elements in a Web document. While for XML, the meta-data 
is explicit in the tags, HTML documents pose more difficulties since they only 
have display information. We address this problem by using a set of heuristics 
to infer the meta-data - these heuristics utilize both the document structure 
and its contents. For example, section headings are regarded as metadata for 
the contents of the associated sections since they describe what the section con- 
tains. Similarly, the title of a document (enclosed in the <TITLE> tag) is used 
as the meta-data label for the entire document. The primary advantage of our 
approach is that it permits automated generation of the doc-graph, which is es- 
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pecially attractive when these graphs have to be generated for sites hosting a 
large number of documents. 

An example of the doc-graph generation process is presented in Figs. 4(a) 
and 4(b), which show part of an HTML document (from our lab’s web-site) and 
its corresponding doc-graph (italics represent links), respectively. 



<HTML> 

<HEAE» 

<TITLE>E)ATABASE SYSTEMS LAB PEOPLE</TITLE> 

</HEAD> 

<BODY> 

<H 1 >CONVENER</H 1 > 

<UL> 

<LIxA href=”http://dsl.serc.iisc.ernet.in/~haritsa''>Jayant Haritsa</A> 

</UL> 

<H1>CURRENT MEMBERS</H1> 

<H2>PhD</H2> 

<UL> 

<LIxA href=”http://dsLserc.iisc.ernet.in/~vikram”>Vikram Pudi (SERC)</A> 
</UL> 

<H2>MSc(Engg)</H2> 

<UL> 

<LIxA href=”http://dsl.serc.iisc.ernet.in/~maya">Maya Ramanath (SERC)</A> 
<LI>B. J. Srikanta(SERC) 

</UL> 



(a) Portion of an HTML Document 




Fig. 4. An HTML Document and its Graph Representation 
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We now explain how to build a site-graph from the set of doc-graphs associ- 
ated with the documents hosted at a web-site. Like the doc-graphs, the site-graph 
is also a rooted, directed and edge-labelled graph, and is constructed using the 
following procedure: 

— The site-graph is initialized to be the doc-graph of the home-page of the 
web-site. 

— The “floating edge” corresponding to each “local link” (anchor that points 
to a document in the same web-site) in the home-page is terminated in the 
root of the doc-graph associated with the document pointed to by that link. 

— The above process is recursively executed for each of the documents that 
have been added to the site-graph, and terminates when all the documents 
reachable from the home-page have been included in the site-graph. Fig. 5 
shows an example. Each box in the figure refers to a different document 
which has been converted into a doc-graph. The words in italics in the figure 
denote the labels of hyperlinks. 




It is perhaps natural to ask whether site-graphs of multiple sites should not 
be connected up together to form a “domain-graph” . The reason we stop at 
building site-graphs is related to our query processing strategy, described later 
in Section 4.3 - since it adopts a query-shipping approach where queries visit 
the various web-sites, it is sufficient to maintain a site-graph at each site. 
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4.2 Query Language 

We now move on to the query language supported by DIASPORA. The objec- 
tives in our language design were the following: 

1. To enable the user to (a) express the content she is searching for through 
“hints” (in the form of keywords) to the query processor, and (b) express 
through “traversal expressions” any information she may have regarding the 
structural relationships among the web-sites where she wants the query to 
be processed. 

The ability to provide the query processor with hints and traversal expres- 
sions is critical to preventing the common problem faced by search engine 
users, namely, that of being deluged by a mass of results with no way of 
easily determining which few among these constitute the relevant set. 

2. To present the results as a weakly connected graph that helps the user to 
“place” each result keyword - that is, to know where the keyword is located 
within the “big picture” of the Web document organization. This feature is 
especially helpful for users who are querying the Web database system in 
an interactive fashion, that is, using the results of a query as the basis on 
which to form more refined queries, and so on until eventually the desired 
information is reached. This is because the placement helps them determine 
the path, which if browsed, is most likely to lead to the desired information. 
For example, suppose the user has asked for publications on “databases” 
and gives the starting point for the search as the IISc homepage, the result 
graph would include a path from the IISc homepage to the SERC depart- 
ment homepage, from the SERC homepage to the Database Systems Lab 
homepage, from there to the publications page which lists the publications 
on “databases”. Given such a placement, it will help the user determine 
whether the result is what she wants or not. Also, it will help her easily de- 
termine what other information she is likely to find if she decides to browse 
along that path. 

To illustrate our solution to the above design criteria, the following is the ex- 
pression in DIASPORA for the example query mentioned in the Section 2 (i.e. 
Find the pages listing the faculty members from all the departments in IISc): 

1. SELECT 

2. { “^department*” , “^faculty*” } 

3. START 

4. http://www.iisc.ernet.in 

5. WHERE 

6. DEFINE DeptLink AS i/AA(“*department*”); 

7. DEFINE Dept AS AEyiFORZ)(“*department*”); 

8. DOC-OF(STAIIT) DeptLink DOC-OF{Dept)- 

9. DOC.OF{Dept) G-G*l SUBGRAPH_OF {“Haculty*”)-, 

The purpose of this query is to “gather” information, and shows how the user’s 
knowledge regarding the hyperstructure of the web can be used in formulating 
such queries. The user specifies her domain knowledge as follows: 
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— There is a path from the IISc homepage to the page listing all departments 
of IISc (departments page) through a hyperlink containing the word depart- 
ment. Thus, in order to locate the list of departments, start from the IISc 
homepage and traverse the hyperlink containing the keyword department. 

— Each department listed in the departments page is a hyperlink which leads 
to the homepage of the department. 

— Information about faculty members is found either at the department’s 
homepage or a web-site directly reachable from the department’s homepage. 

The query contains a SELECT clause which states that the keywords of 
interest are “department” and “faculty”. Then, each item of the user’s knowledge 
is expressed as follows: 

— Lines 6 and 7 of the query simply define a hyperlink {DeptLink) which con- 
tains the keyword “department” and a keyword {Dept) containing the term 
“department” . 

— Line 8 tells the query processor to start with the IISc homepage and then 
traverse the link DeptLink in order to find the document containing the 
keyword Dept. 

— Line 9 tells the query processor to follow at least one global hyperlink from 
the current page and search for “faculty” in a resulting document reachable 
by following at most one global hyperlink from the resulting document. 

In lines 8 and 9 we have used DOC-OF and SUBGRAPH^OF. These are 
collectively known as Scopes of Traversal and Search. When a scope occurs on 
the LFIS of a traversal expression, it denotes the traversal scope and when it 
occurs in the RFIS of a traversal expression, it denotes the search scope. Line 
8 effectively states: “start from the document corresponding to START and 
traverse DeptLink, then restrict your search for Dept to the document reached” . 
Line 9 states: “starting from the document corresponding to Dept, follow G-G*l 
and then search the subgraph of the destination reached for “*faculty*” . In short, 
we make use of scopes in order to restrict or expand the search space and/or 
traversal space. A more detailed description of scopes is given in [21]. 

It is easy to see that the above query can be evaluated in a centralized man- 
ner at the user-site by importing the associated documents from each of the 
relevant web-sites, constructing a site graph and then processing the queries lo- 
cally. This centralized mode of operation is a common feature of most previous 
Web database system proposals, including the SQL-like W3QL with interfaces to 
Unix tools [11], the declarative logic-based WebLog [12], the hyperlink-pattern- 
based WebSQL [17], as well as the OQL-based Lorel [1] and the graph-based 
UnQL [4] for semi-structured databases. However, as mentioned earlier, central- 
ized approaches are inefficient from a variety of considerations including transfer 
of large amounts of unnecessary data resulting in network congestion and poor 
bandwidth utilization, the client-site becoming a processing bottleneck, and ex- 
tended user response times. We therefore discuss next an alternative distributed 
approach. 
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4.3 Distributed Query Processing 

In our distributed query-processing scheme, queries emanating from the user-site 
are forwarded from one site to another, the query is processed at each recipient 
site, and the associated results are returned to the user. Since our design ensures 
that the query forwarding does not require tight coordination from any “master 
site” , it results in a highly distributed solution. 

At an intuitive level, the distributed processing operates in the following fash- 
ion: The query is first sent to the sites corresponding to the StartPoints specified 
by the user in her query. Each of these sites completes its local processing of the 
query (which is some sub-query of the original query submitted by the user) 
and sends back the generated results, if any, to the user-site. Further, based on 
the structural hyperlink patterns (encoded as path regular expressions) in the 
query, it may modify the current query to reflect the completed processing of 
the sub-query and send the rest of the query to another set of sites. This set of 
sites is determined from the hyperlinks contained in the local site. These sites 
also perform similar query processing operations and the process continues until 
all the paths that match with the structural pattern have been fully explored 
and there are no more sub-queries remaining. 

The above strategy is implemented through QueryAgents. A Query Agent is a 
message that initially carries the entire query and its current processing state to 
the StartPoints. At each site the agent state is updated to reflect the movement 
and local processing of the query, and new QueryAgents may be generated to 
carry the unprocessed part of the query forward to other sites. 

Results are directly returned from the query-site to the user-site. This is 
achieved by the user-site opening a listening communication socket to receive 
results - the associated port number is sent along with the Query Agent. When 
a query-site wishes to communicate results, it utilizes the IP address of the user- 
site and the port number which came along with the agent to directly transmit 
the results to the user. 

4.4 Determining Query Completion 

Since, as described above, QueryAgents migrate from site to site without explicit 
user intervention, it is not easy to know when a query has fully completed its 
execution and all its results have been received - that is, how do we know for 
sure whether or not there still remain some agents that are active in the network. 
Note that solutions such as “timeouts” are difficult to implement in a coherent 
manner given the considerable heterogeneity in network and site characteristics. 
They are also unattractive in that a user may have to always wait until the 
timeout to be sure that the query has finished although it may have actually 
completed much earlier. 

To address the above problem, we have incorporated in DIASPORA a special 
mechanism called the CHT (Current Hosts Table) protocol. The CHT protocol 
requires a minimal amount of synchronization between the query-sites and the 
user-site, but in return for this minor reduction in the decentralization of the 
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processing, it ensures an effective and elegant means for determining query com- 
pletion. 

In this protocol, for each query submitted at a user-site, the local DIASPORA 
client process maintains a Current Hosts Table that keeps track of all the sites 
where the QueryAgents for this query are active. The attributes of the table are: 
(1) The URL of the EntryPoint (i.e. the starting HTML document) at the query- 
site, and (2) The state of the agent on arrival at the query-site. As described 
earlier in this section, after an agent arrives at a site and is processed, the local 
DIASPORA server determines the set of sites to which the new set of agents 
should be forwarded. Before forwarding the agents to these sites, the current 
site sends this “new-agent” information to the user-site in the form of a list of 
rows to be added to the CHT being maintained there. It also adds the URL of 
the EntryPoint and the (arrival) state of the agent that it received to the top 
of the list. When the user-site receives this list, it marks the entry in its CHT 
corresponding to the top-most entry in the list as deleted (signaling completion 
of query processing for the EntryPoint at the sending site) and inserts the list’s 
remaining new-agent entries into the CHT. When all the entries in the CHT have 
been marked as deleted, it can be concluded that the query has been completely 
processed. 

Note that only after the new-agent list is successfully sent are the agents for- 
warded to the next set of EntryPoints. The reason we process in this particular 
order is to ensure that the CHT at the user-site will always have complete knowl- 
edge about the sites at which the query is supposed to be currently executing and 
will therefore always be able to detect query completion. If the opposite order 
had been used, it is possible that the query may have been forwarded but the 
CHT not updated due to a transient communication failure between the current 
site and the user-site. This could lead to the possibility of the user-site wrongly 
determining that a query has completed when in fact it is still operational in the 
Web. 



4.5 Query Termination 

If a user decides to cancel an ongoing query, this message has to be communicated 
to all the sites that are currently processing the query. One option would be for 
the user-site to actively send termination messages to all the sites associated with 
the URLs listed in the Current Host Table. An alternative would be to purge the 
query locally at the user-site and to close the listening socket associated with 
the query - subsequently, when any of the sites involved in the processing of this 
query attempt to contact the user-site to return the local results, the connection 
will fail ~ this is the indication to the site to locally terminate the query. Note 
that since we insist that the CHT related information should first be sent to 
the user-site before forwarding the query to other sites, we do not run into the 
problem of termination messages having to “chase” query messages in the Web 
(this is similar to the problem of “anti-messages” chasing “event messages” in 
distributed optimistic simulation [7]). 
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4.6 Hosting Issues 

An implicit assumption in the above framework is that a query processor capable 
of handling DIASPORA queries is executing as a daemon process at each site 
participating in the distributed execution of the query. At first sight, this re- 
quirement may appear unrealistic to fulfill - however, such distributed facilities 
are already becoming prevalent with the rapid spread of mobile agent technol- 
ogy [18]. Similar architectures have also been successfully implemented in the 
Condor distributed job execution facility [16] (now productized by IBM and 
called LoadLeveler) . Further, even if some sites were to refuse to participate in 
this effort, we can always revert to the traditional centralized approach for the 
queries related to these sites. That is, we can have a hybrid query engine that is 
a combination of distributed and centralized processing. 

Note also that for specific “domains” - for example, a campus or a company 
- that have a controlling authority, it may be quite feasible to have DIASPORA 
run at each site in the domain. Therefore, a starting point would be to use DI- 
ASPORA within such environments and then graduate to perhaps incorporating 
larger portions of the web. 

Further, query-sites, especially those providing commercial or public services, 
may have a “selfish” motive for hosting DIASPORA - the fact that queries are 
run locally give it much more information about what users want and therefore 
can help it to structure its services much better. That is, the ability to do “query 
mining” , to discover interesting patterns in what people are looking for can be 
the incentive for sites to participate in this cooperative endeavor. 

4.7 Eliminating Query Recomputations 

Due to the highly interconnected structure of the web, different agents of the 
original query may visit the same site in effectively the same state of computation 
following different paths. Note that if we do not detect these duplicate cases and 
blindly compute all queries that are received, not only is it a waste locally but 
subsequently the same sequence of steps followed by a previous agent will take 
place - in effect, we may have a “mirror” agent chasing a previously processed 
agent over the Web. This will also have repercussions at the user-site since 
the same set of results will be received multiple times and these will have to 
be filtered. In short, permitting duplicate query processing can have serious 
computation and communication performance implications. 

From the above discussion, it is clear that each site should be able to evaluate 
the current state of an agent and also store this information locally in order to 
permit future comparisons. This is solved in Diaspora using an Agent Log Table 
that contains information with regard to agents that have previously visited the 
site, including the URL of the EntryPoint on which the agent is processed, the 
global identifier of the query, and the state of the agent. 

When a new agent arrives at a query server, a new log table record is con- 
structed for this agent and it is checked whether an equivalent entry already 
exists in the log table. If an equivalent entry exists, the agent is purged, other- 
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wise, the new record is inserted in the log table, the agent is updated if required 
and then locally processed. 

4.8 Reduction of Network Traffic 

As discussed before, with DIASPORA no web resource is ever downloaded to 
perform a query operation over it. This is in marked contrast to the centralized 
approaches taken in search engines and in many of the previously proposed Web 
querying systems, including [17,12,11]. Apart from this, the additional optimiza- 
tions are: 

1. The agent results and the newly generated CHT information to be added to 
the CHT at the user site are shipped together. Further, if a query is received 
for multiple EntryPoints at a common web-site, all the associated results 
and corresponding CHT are hatched together and sent to the user-site. 

2. When forwarding agents, if the agents are to be sent to multiple EntryPoints 
that are all physically located at a common remote site, they are bundled 
together and sent only once. 

3. Query termination is implemented passively, as described in Section 4.5, 
therefore not requiring additional termination messages from the query site 
to the sites currently hosting the agents of this query. 

4.9 Performance Evaluation 

Based on the above design, a prototype implementation of DIASPORA has been 
developed. The prototype is fully developed in Java - the details of the imple- 
mentation are available in [21]. We evaluated our prototype of the DIASPORA 
system on a testbed of representative sites on our campus network. Our exper- 
imental results indicate that DIASPORA considerably reduces network traffic 
and improves user response times as compared to equivalent centralized systems. 

5 Conclusions 

In this paper, we motivated why search engines, due to their coarse query granu- 
larity and centralized architectures, may be expected to run out of steam in the 
future with the anticipated large increases in both the size of the Web and the 
query traffic. We discussed how database technology could be used to address 
these problems and mentioned various ways in which Web technology has been 
integrated with database technology, with special emphasis on the framework 
where the Web itself is treated as an enormous (and potentially the largest in 
the world) database of information. For this framework, we presented the high- 
lights of the design of a new Web database system called DIASPORA that we 
have developed and successfully tested on our campus network. The system sup- 
ports fine-grained content and structural queries and implements a distributed 
processing architecture. DIASPORA also opens up opportunities for mining user 
queries to improve commercial and public services offered by web-sites. 
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Abstract. In this paper, we describe techniques that can be used to in- 
corporate updates at a data warehouse on-the-fiy. An incremental view 
maintenance algorithm is described that incorporates updates from data- 
sources at a dynamic data warehouse. Similarly, partitioning and recur- 
sive decomposition techniques are described that allow efficient querying 
and updating of summary data at the warehouse for analytical process- 
ing. 



1 Introduction 

Data warehousing is used for reducing the load of on-line transactional systems 
by extracting and storing the data needed for analytical purposes (e.g., decision 
support, data mining). A materialized view of the system is kept at a site called 
the data warehouse, and user queries are processed using this view. The view has 
to be maintained to reflect the updates done against the base relations stored at 
the various data sources. The efficient incremental maintenance of materialized 
views has become an important research issue since the update efficiency of the 
warehouse view is counterbalanced by the query overhead at the data sources. 
Several approaches have focused on the problems associated with incremental 
view maintenance [ZGMHW95,HZ96,GJM96,QGMW96,GM95,CGL+96,RKZ00]. 

Gommercial approaches for data warehousing and on-line analytical process- 
ing rely on batch updating of data. Although several warehouse maintenance 
algorithms have been proposed in the literature, they are largely ignored since 
current warehousing applications do not consider on-line updates of much impor- 
tance. In the context of OLAP tools such as a data-cube, the situation is similar. 
Much of the research efforts is directed towards building OLAP tools such as the 
data cube that are optimized primarily for query processing [HAMS97]. In doing 
so, data cubes leave out a large class of applications that form the core of OLAP 
- what-if queries. Thus, in current data warehousing and analysis applications, 
on-line update complexity is rarely considered to be of significance. 

Most warehousing and analysis systems are oriented towards batch updates, 
and for a wide variety of current-day business applications this is considered 
sufficient. As the role of digital technology and electronic interaction proliferates 
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in all aspects of our lives, it is desirable to make the data integration capabilities 
of a warehouse as well as the data analysis capabilities of a data cube accessible to 
entities that are not always large enterprises. In this paper, we focus on the issue 
of on-line updates in the context of view maintenance and analytical processing 
at dynamic data warehouses. 

2 View Maintenance at Data Warehonses 

View Maintenance algorithms need to be designed for an environment in which 
there are multiple distributed autonomous sources and data at a source may be 
updated as a result of local update transactions, which are independent with 
respect to the update transactions at other sources. Several consistency notions 
have been associated with the views at the data warehouse, viz., complete con- 
sistency, strong consistency, weak consistency, and convergence [ZGMHW95] 
[ZGMW96]. In this paper, we focus on algorithms that support complete con- 
sistency and strong consistency. 



2.1 The Data Warehouse Model 

Updates occurring at the data sources can be classified into three categories: 

1. Single update transactions where each update is executed at a single data 
source. 

2. Source local transactions where a sequence of updates are performed as a 
single transaction. However, all of the updates are directed to a single data 
source. 

3. Global transactions where the updates involve multiple data sources. 

In this paper we will restrict our attention to updates of types 1 and 2. Depending 
on how the updates are incorporated into the view at the data warehouse, dif- 
ferent notions of consistency of the view have been identified [ZGMW96,HZ96]: 

— Convergence where the updates are eventually incorporated into the mate- 
rialized view. 

— Strong consistency where the order of state transformations of the view at 
the data warehouse corresponds to the order of the state transformations at 
the data sources. 

— Complete consistency where every state of the data sources is reflected as 
a distinct state at the data warehouse, and the ordering constraints among 
the state transformations at the data sources are preserved at the data ware- 
house. 

Gommercially available data warehouse products such as Red Brick systems 
[RBS96] only ensure convergence. 

The architecture of the data warehouse is as shown in Figure 1 [HGMW+95]. 
The underlying system used for the data warehouse consists of n sites for data 
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Site 1 Site 2 



update / query 
pocessor 




Site n 



Fig. 1. An Architecture of a Data Warehouse 



sources and another site for storing and maintaining the materialized view of 
the data warehouse. The communication between each data source and the data 
warehouse site is assumed to be reliable and FIFO The underlying database 
model for each data source is assumed to be a relational data model with a rela- 
tion at each source. Updates at each source are monitored and are transmitted 
asynchronously from the source to the data warehouse. 

We assume that the view function used at the data warehouse for the mate- 
rialized view is defined by the SPJ-expression (selection-projection-join): 

n '^SelectCond (.Ri .R* ^ ^n) 

ProjAttr 

The updates to the base relations are assumed to be inserts and deletes of 
tuples. Furthermore, a modify is modeled as a delete followed by an insert. We 
assume that the multiplicity of a tuple is maintained in terms of a control field 
that maintains the occurrence of each tuple [GMS93]. The main problem is to 
maintain the materialized view at the data warehouse in the presence of up- 
dates. A simple approach of recomputing the view as a result of each update 
is unrealistic. A more appropriate solution would be to update the data ware- 
house incrementally. In the absence of concurrent updates, when an update ARi, 
i.e., an update to the base relation Ri, is received at the data warehouse, the 
incremental changes are computed as follows [HJ91,HZ96,GM95,GHJ96]: 

n '^SelectGond (Ai N • • • M ARi M • • • M i?„) . 

ProjAttr 
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Using the above notation, AR for an insert will carry a positive sign and hence 
will have the effect of adding tuples to the materialized view. Deletes, on the 
other hand, carry a negative sign and therefore result in removing the appropriate 
tuples from the materialized view. 

The data warehouse problem is to maintain the views incrementally in the 
presence of concurrent updates occurring at the data sources. Consider a view 
involving a join of three relations, i?i M i?2 .Rs- When an update ARi arrives 
at the data warehouse we can composes the incremental query Qi = ARi N 
i?2 R3 and send it to the data source. While Qi is in transit from the data 
warehouse to the data source, further updates, e.g. AR2, may occur at the data 
source and may be delivered to the data warehouse. As a result of the two 
updates, the view should change to the following: 



(i?i + AR\) M (i?2 + AR2) N i?3 = 



^ {Rl IXI i?2 R3) \ 

+ 

{ARi N i?2 X R3) 

+ 

(Rl N AR2 R3) 

+ 

\(Z\Ri IXI AR2 XI R3) J 



The answer to the incremental query Qi will include the effects of ARi and 
AR2 and hence will be Ai = (Z\Ri X R2 X R3) + (Z\Ri X AR2 X R3). We 
refer to AR\ X AR2 X R3 as the error term in the incremental answer due 
to concurrent update AR2- Incorporating A\ into the materialized view will 
not reflect all the changes that should have occurred after the two updates, 
i.e., Rl X AR2 X R3, is missing. A blind formulation of the incremental query 
Q2 as Rl X AR2 X R3 will result in an incorrect answer since Qi has partially 
incorporated the effects of update AR2. The EGA [ZGMHW95] protocol is based 
on this idea and uses the notion of “compensation” to formulate a query Q2, to 
offset the error term introduced in Ai, as: 



(Rl X AR2 X R3) - (Z\Ri X AR2 X Rg) 



2.2 On-Line Error Correction of Incremental View Computations 

Many of the proposed algorithms [ZGMHW95,ZGMW96] completely evaluate 
the answer to a query before doing any compensation. As a consequence, all 
updates that are received at the data warehouse between the time when the 
query is initiated up to the time it is fully evaluated are considered concurrent 
updates. However, in a distributed setting, an update from a data source will 
only interfere with a query if the update occurs between the sending of the query 
to and the receiving of the answer from that data source. If, on the other hand, 
the update occurs after the query has been evaluated at that source, then this 
update does not interfere with the answer. Hence, the answer should not be 
compensated for such updates. 
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Waiting to do the compensation until the query has been completely evalu- 
ated at all data sources also results in the loss of the opportunity of subtracting 
the error term locally at the data warehouse itself. That is, the compensation has 
to be done by sending compensating queries to eliminate the effects of concur- 
rent updates. In a distributed setting, the answer to a query cannot be evaluated 
atomically. As shown in Figure 2, a query i?i ixi • • • ixi ARi XI Ri+\ XI • • • XI 
Rn needs to be computed iteratively as follows: first in the left direction from 
Ri and then proceed rightward from Ri. The left side of the query proceeds 
iteratively by performing the computation at Ri-i, getting an answer Ai_i, fol- 
lowed by performing the computation at Ri -2 and so on as shown in the figure. 
Similarly for the right direction. 



RI R2 Ri-1 Ri Ri+1 



Rii 



update 
delta Ri 



y V 




answer 

Fig. 2. On-line Incremental View Computation 



While this query is being evaluated updates may occur at any of these sources, 
and as a result an error term may be introduced into the answers from the 
individual sites. For example, when the query initiated as a result of ARi is in 
progress, an update ARi_i occurs before Ri-i XI ARi is evaluated at Ri-i- As 
a result of the FIFO property of communication channels, the data warehouse 
must receive ARi-i before it receives the answer to the query Ri-i M ARi. From 
this the data warehouse can conclude that the answer includes the error term 
ARi-i XI ARi, which can be evaluated locally and its effects can be eliminated 
from the answer set resulting in the desired answer i?i_i XI ARi. We refer to 
this as an on-line error correction since it eliminates the effects of concurrent 
updates as soon as they are detected at the data warehouse. In contrast, in 
the Strobe/C-strobe algorithm, this error accumulates until the entire query is 
completely evaluated by querying all data sources. Under this approach, the local 
information at the data warehouse is not sufficient to eliminate this accumulated 
error and hence the data sources need to be queried to compensate for this error. 

In the general case, the warehouse may receive a concurrent update ARj, 
j < i, while it is evaluating the incremental query resulting from ARi. As before. 
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when the answer arrives from the data source Rj, it is {Rj + ARj) M N 

• • • 1X1 Ri_i M ARi instead of Rj N Rj+i XI • • • ixi M ARi. The error 

term included in the answer is ARj X Rj+i N • • • M R^_i XI AR^, which can 
be evaluated locally at the data warehouse since both components, ARj and 
Rj+i XI • • • X i?i_i X ARi are available at the data warehouse. Note that the 
latter term is the partially evaluated answer from Rj+i for the query initiated 
on account of ARi. The case of a concurrent update ARj, j > i, is symmetric. 



Algorithm 


Architecture 


Consistency 


Msg Cost 


Comments 


EGA 


Centralized 


Strong 


0(1) 


Remote Compensation 
Quadratic Message size 
Requires Quiescence 


Strobe 


Distributed 


Strong 


0{n) 


Unique key assumption 
Requires Quiescence 


C-strobe 


Distributed 


Complete 


0(n!) 


Unique key assumption 
Not scalable 


SWEEP 


Distributed 


Complete 


0{n) 


Local compensation 



Table 1. Comparison of various view maintenance algorithms 



In [AESY97], we developed an algorithm based on on-line error correction, 
referred to as SWEEP, to update the materialized view at the data warehouse 
incrementally for every update. The updates occurring at different data sources 
are assumed to be totally ordered based on the order in which the updates are 
delivered to the data warehouse. The materialized view, therefore, is updated 
in the order of these updates. Thus for an update u, the algorithm ensures 
that the effects of all the updates that arrived at the data warehouse before u 
will be reflected in the materialized view but none of the effects of the updates 
that arrived after u will be included. Hence, the algorithm ensures complete 
consistency. Another property of this algorithm is that in contrast to the earlier 
approaches for view maintenance, the error compensation is completely localized 
at the data warehouse. As a consequence, the cost of computing the view change 
per update in this algorithm is linear in the number of messages, i.e., only (n — 1) 
messages are needed where n is the number of data sources. This is significantly 
cheaper than C-strobe which supports the same notion of consistency as SWEEP 
but has a message complexity of (n — 1)! in the worst case. This algorithm can 
be easily extended to compute view changes for multiple updates [AESY97]. 
Table 1 compares the properties of SWEEP with respect to some of the known 
algorithms for incremental view maintenance. 

3 Data Cube for On-Line Analytical Processing 

Data cubes [HRU96] are used at data warehouses to summarize aggregate infor- 
mation from potentially multiple sources. In this section, we turn our attention 
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to a different and relevant issue which arises in data warehouse: how to update 
a data cube once the update information is known. As discussed before, data 
cubes have been solely considered as read-only information useful for analysis, 
and hence update costs are exorbitant. We believe that this is a main hurdle 
in the wide use of powerful speculative tools and in the wide dissemination of 
analysis tools to a diverse community of users with varied application needs that 
require if not frequent updates, but at least inexpensive updates. 

3.1 The Data Cube Model 

A data cube is designed to provide aggregate information over certain dimen- 
sions of the data. In general, it has a single measure attribute, e.g., COST, # 
of documents, etc., and d feature attributes, e.g., age, longitude, latitude, 
time, etc. Without loss of generality, we will assume each feature dimension 
has the same size, denoted by n. We can represent the d-dimensional data cube 
by a d-dimensional array A of size n'^. Each cell in array A contains the ag- 
gregate value of the measure attribute corresponding to a given point in the 
d-dimensional space formed by the dimensions. For example, given the measure 
attribute SALES and the dimensions AGE and DATE, the cell at A[37, 220] con- 
tains the total sales to 37-year-old customers on day 220. Thus, a range-sum 
query asking for the total sales to 37-year-old customers from days 220 to 222 
would be answered by summing the cells A[37, 220], A[37, 221], and A[37, 222]. 
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Fig. 3. The base value and prefix sum of a two-dimensional data cube 



Array A can be used by itself to answer range sum queries; we will refer to 
this as the naive method. Arbitrary range queries on array A can cost 
a range query over the range of the entire array will require summing every 
cell in the array. Updates to array A take 0(1): given any new value for a cell, 
an update can be achieved simply by changing the cell’s value in the array. 
The prefix sum approach [HAMS97] achieves 0(1) complexity for queries and 
0{n‘^) complexity for updates. The essential idea of the prefix sum approach is 
to precompute many prefix sums of the data cube, which can then be used to 
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answer ad hoc queries at run-time. Figure 3 shows the array P employed by the 
prefix sum approach derived from array A. Each cell j] in array P stores 
the sum of all cells that precede it in array A, i.e., SUM(A[0,0]:A[i,j]). Using the 
prefix sum method, arbitrary range sum queries can be evaluated by adding and 
subtracting a constant number of cells in array P as illustrated geometrically in 
Figure 4. 
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Fig. 4. A geometric illustration of the two-dimensional case 




3.2 Update Efficient Data Cubes 

The relative prefix sum [GAAS99] makes use of new data structures that provide 
constant time queries while controlling cascading updates. By creating bound- 
aries that limit cascading updates to distinguished cells, the method reduces the 
overall complexity of the range sum problem. Our method makes use of two 
components: an overlay and a relative-prefix (RP) array. An overlay partitions 
array A into fixed size regions called overlay boxes. Overlay boxes store informa- 
tion regarding the sums of regions of array A preceding them. RP is an array of 
the same size as array A; it contains relative prefix sums within regions defined 
by the overlay. Using the two components in concert, we construct prefix sums 
on-the-fly. Together, the components limit cascading updates to distinguished 
cells thus limiting the cost of updates while keeping the cost of queries 

constant as in the prefix-sum method. 

We briefly describe the notion of the Basic Dynamic Data Cube that incorpo- 
rates updates to the data cube. The dynamic data cube utilizes a tree structure 
which recursively partitions array A into overlay boxes. Each overlay box will 
contain information regarding relative sums of regions of A. By descending the 
tree and adding these sums, we efficiently construct sums of regions which begin 
at A[0, 0] and end at any arbitrary cell in A. To calculate complete region sums 
from the tree, we also make use of the inverse property of addition as illustrated 
in Figure 4. We will first describe overlays for dynamic data cubes, then describe 
their use in constructing the Basic Dynamic Data Cube. 

During the construction of the Basic Dynamic Data Cube, overlay boxes are 
organized as a tree to recursively partition array A (Figure 6). We define an 
overlay as a set of disjoint hyperrectangles (hereafter called “boxes”) of equal 
size that completely partition cells of array A into non-overlapping regions. Each 
overlay box stores certain values. S is the subtotal cell, while XI, X2, XU are 
row sum cells in the first dimension and Yl, Y2, Y3 are row sum cells in the 
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Fig. 5. Calculation of row sum values stored in overlay boxes 



second dimension as shown in Figure 5. Figure 6 shows array A from Figure 3 
partitioned into overlay boxes of size 4x4. The root node of the tree encompasses 
the complete range of array A. The root node forms children by dividing its range 
in each dimension in half. It stores a separate overlay box for each child. Each 
of its children are in turn subdivided into children, for which overlay boxes are 
stored; this recursive partitioning continues until the leaf level. Since a single- 
cell overlay box contains only the subtotal cell, the leaf level corresponds to the 
values stored in the original array A. 
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Fig. 6. Tree structure of the Dynamic Data Cube 
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The query process begins at the root of the tree. Using the target cell, which is 
the cell whose value needs to be computed, the algorithm checks the relationship 
between the target cell and the overlay boxes in the node. When an overlay box 
covers the target cell, a recursive call to the function is performed, using the 
child associated with the overlay box as the node parameter. When the target 
cell is after the overlay box in every dimension, the target region includes the 
entire overlay box, and the box contributes its subtotal cell to the sum (Overlay 
box labeled Q in Figure 6. When the cell is neither before nor completely after 
the overlay box (overlay boxes labeled R and S in Figure 6), the target region 
intersects the overlay box, and the box contributes a row sum value to the sum. 
Exactly one child will be descended at each level of the tree (as shown in Figure 6) 
since overlay boxes completely partition array A into disjoint regions. Hence, the 
query complexity is O(logn). 

The value of a cell can be updated by descending a single path in the tree. 
This follows from the construction of overlay boxes. At any level of the tree, 
an update to a cell effects only the overlay box that contains it; other overlay 
boxes are unaffected. The update algorithm makes use of a bottom-up approach. 
It first traverses the tree to the leaf associated with the target cell. When the 
leaf is reached, the algorithm determines the difference between the old and new 
values of the cell, and stores the new value into the cell. The difference value 
is used to update overlay box values in ancestor nodes of the tree. Assume as 
shown in Figure 6, cell marked as is to be updated, from 5 to 6. The cell in 
overlay box N at the leaf level is updated to new value 6. The difference (-1-1) is 
used to update overlay box V at level 1 and eventually T at the root level. Only 
one overlay box is updated at each tree level; therefore, the cost of updating 
the Basic Dynamic Data Cube is 0(log n) plus the cost of updating the values 
in these overlay boxes. However, updates to overlay boxes can be expensive. It 
can be shown that the worst-case update cost of the Basic Dynamic Data Cube 
becomes 

A recursive decomposition technique can be used to further reduce the update 
complexity of dynamic data cubes. The idea of this decomposition is as follows. 
An overlay box of d dimensions has d groups of row sum values, and each group 
is {d — 1) dimensional. We observe the fact that each group of row sum values 
has the same internal structure as array P. This concordance suggests that the 
two-dimensional row sum value planes be stored as two-dimensional data cubes 
using the techniques already described. Thus, the overlay box values of a d- 
dimensional data cube can be stored as {d — 1 (-dimensional data cubes using 
Dynamic Data Cubes, recursively; when d = 2, we use a variation of B-tree to 
store the row sum values. Our preliminary analysis indicates that the query and 
update cost under this recursive decomposition is O(log'^n) [GAEOO]. 

4 Concluding Remarks 

For many application domains data is sparse or clustered. Examples include most 
geographically-based information, such as geographically oriented business data 
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(e.g. sales by region, median income of households by region, etc.), and scientific 
measurements (e.g., levels of carbon monoxide production at numerous points on 
the Earth’s surface, locations of stars in space). Still other applications require 
that data be allowed to grow dynamically in any direction, rather than in a single 
dimension as with append-only databases. Current techniques for data cubes do 
not handle these cases well. For these and other potential application domains, 
a method that achieves sublinear performance for both queries and updates is 
needed. The method should permit the data cube to grow dynamically in any 
direction to suit the underlying data, and should handle sparse or clustered data 
efficiently. 
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Abstract. R-tree is widely used in indexing of multidimensional and 
spatial data. When it is used in environment of COW(Cluster Of Work- 
stations), to the best of our knowledge it is used in Master-Slave mode 
or its variants. Under this mode when an entry is accessed, it must be 
locked. The higher the locked entry in the R-tree, the more transactions 
will be stopped from accessing all the offspring of it. Therefore the de- 
gree of concurrency is paralyzed. Moreover, the Master will become the 
hotspot in parallel processing which can worsen the overall performance. 
In this paper we present an upgraded parallel R-tree model which is 
suitable for the environment of COW. Our parallel R-tree can not only 
balance the load among processors but also decrease the conflicts of intra- 
and inter-transactions. We can successfully prevent the emergence of hot 
spot from accessing the R-tree. The detailed searching algorithm and the 
load shipping algorithm of this parallel R-tree are given . Lastly we pro- 
vide experimental results which show that the parallel R-tree performs 
well as expected. 



1 Introduction 

With the development of multi processors and huge capacity disks, the complex 
multi-dimensional data objects such as maps and images can be processed in 
massive parallel mode. In systems like CAD and CIS, a good index structure 
is indispensable to efficiently access the spatial data. In general the databases 
of spatial objects are large, so do the index files, therefore the time needed for 
indexing is considerably long. One method to solve this problem is to parallel 
the process of indexing. 

Cluster of Workstations(COW) belongs to the shared-nothing architecture 
[6], and COW inherits its characteristics of good scalability and availability. In 
addition, as it is relatively cheap compared with other parallel architectures, 
it is an ideal platform for parallel databases in practical use. We can sketch 
COW structure as following: some independent components, each consists of a 
processor, considerable size memory and disks, are connect through high speed 
network. Under this architecture, the system bottleneck still exists in the I/O 
subsystems and communication subsystems [4] . 

R-tree is a prevalent dynamic index structure used in indexing multidimen- 
sional data or spatial data. However, only a few works are investigated on the 
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parallel R-tree structure under the architecture of COW. To the best of our 
knowledge, these R-trees are M-Rtree[3] , MC-Rtree[3] and GPR-Tree[4]. We pro- 
pose an upgraded R-Tree architecture in this paper, which can not only prevent 
the emergence of hotspots, but also balance the load evenly among the proces- 
sors. It can also reduce the conflicts of inter-transactions and infra-transactions. 

The rest of the paper is organized as follows. We introduce R-Tree and several 
parallel R-tree in Section 2. And analyze the reason why a new parallel R-tree 
structure for COW is needed. Then we present the structure of the updated 
R-tree and the parallel operation algorithms on it in Section 3. In Section 4, 
we describe the experimental environment and show the results. Concluding 
remarks appear in Section 5. 

2 R-tree and Related Work 

2.1 R-tree 

R-tree [2] is similar to B-tree in that it is also a balance tree, i.e. all of the leafs are 
of the same depth. R-tree is a dynamic tree, too. When insert or delete operations 
are processed, the R-tree needs not to be reconstructed, and the operations can 
be done just on it. 

The structure of the leafs in R-tree is: (I, tuple-identifier). Tuple-identifier 
is a pointer to a certain record in database, and I is a N-dimensional rectangle: 
/ = (/o,/i,/ 2 , Here li is the smallest section [a, &] covering the record 

on the ith dimension, in which a or b or both can equal to infinity. That is, the 
N-dimensional rectangle is the smallest one that comprises the record. 

The structure of infra nodes of R-tree is: (I, child-pointer). Child-pointer 
denotes the pointer to the next layer of sub-tree, and I is the smallest rectangle 
that comprises all the records in the sub-tree. Suppose M is the maximum number 
of sub-trees, and m < M/2 is the minimum number of the sub-trees that a 
node must own. Then the depth of an R-tree that comprises N index records is 
logmN — 1. Similarly, the maximum number of nodes in an R-tree that comprises 
N index records is N/m+N/rri^+ . . . -|-1. The worst spatial utilization rate of any 
node except the root is m/M. 

Below is the searching algorithm of R-tree. Other algorithms such as inserting 
and deleting can be found in the related references. 

Although R-tree is an extension of B-tree, its searching process is quite dif- 
ferent from that of B-tree. The reason lies in the fact that generally several leaf 
nodes have to be visited in R-tree, but in B-tree only one leaf node has to be 
visited. 

The intrinsic parallelism of R-tree. [1] As stated in the previous section 
the searching process in R-tree is different from that of B-tree. The searching 
path of R-tree is often more than one. Disparate searching strategies such as 
left-precedence or depth-precedence are available to search R-trees. We can see 
it from Figure 1. 
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Fig. 1. An R-tree and its search paths 



The order in right-precedence: a^d^h^p^o^g^n^c^f^k 
The order in width-precedence: a^c^d^f^g^h^k^n^o^p 
In the above serial travel, parallel I/O and multi-processors are not fully 
utilized. Yet its searching path can be transformed into a series of sub-paths. 
For example, the searching paths to the leafs(drawn in bold) in Figure 1 can be 
transformed into 4 sub-paths in Figure 2. Pathl, Path2, Path3 and Path 4 are 
of the same depth so they can be processed in parallel. The depth of these paths 
is shorter than that of the paths in serial travel, therefore the performance will 
be improved. 




(Pathl ; a-c-f-k) 
(Path2; a-d-g-n) 
(Paths : a-d-h-o) 
(Path4: a-d-h-p) 



Fig. 2. An R-tree and its search paths 



2.2 Parallel R-tree 

M-Rtree. [3] In M-Rtree, one dedicated machine becomes the master server and 
contains all internal nodes of the parallel R-tree. The leaf level at the master 
does not hold the leaf level of the global tree, but rather (MBR, site-id, page- 
id) tuples for each global leaf level node. The leaf nodes of the global tree are 
declustered across the other sites. The (site-id, page-id) is used to designate at 
which page and which site the leaf level page is located. 




122 



S. Lai, F. Zhu, and Y. Sun 



MC-Rtree. [3] The data at the master machine is organized the same way as 
suggested by the M-Rtree, i.e. only non-leaf nodes are stored at the master site. 
Unlike the M-Rtree, only the (MBR, site-id) pairs are needed at the leaf level of 
the master site, not the page-ids. Each client builds a complete R-tree for the 
portion of the data assigned to it. There is redundant information stored in the 
MC-Rtree compared to the M-Rtree. 

GPR-Tree. [4] The GPR-tree uses a Global index tree shared by multiple 
PUs(Processor Units) in the system. The PUs maintain the consistency of the 
index by exchanging messages. In memory of each PU exists a fraction of the 
GPR-tree, these R-tree nodes residing in memory are classified into two groups, 
one type is marked LOCAL which means the corresponding page resides on the 
local disk, the other type is marked REMOTE meaning the page resides on a 
remote PU. If a node is marked LOCAL, it was the copy in charge of maintaining 
the coherency of the multiple copies in the system, the copy marked REMOTE is 
used to increase the parallelism in the system. The nodes of the tree in memory 
are scheduled by a page schedule algorithm according to the priority of the node. 

All of the above parallel R-trees have some shortcomings in their structures 
that undermine the parallelism. For example, M-Rtree and MC-Rtree adopt 
the Master-Client mode, so Master always becomes the visiting hotspot and the 
bottleneck that will increase the response time of transactions. GPR-Tree avoids 
some drawbacks of MC-Rtree, but it can not control the load balance among the 
processors. Moreover, all of them do not take into account how to decrease the 
conflicts inter-transactions and infra-transactions when multi transactions are 
processed concurrently. 

2.3 The Shortcomings of Conventional R-trees 

The purpose of research on index structures, whether they are search trees or 
HASH tables, is to make the accessing cost and disk load as low as possible, 
at the mean time, the index can be visited concurrently by multi transactions. 
In conventional search trees, index accessing usually begins at the root, so the 
root will undoubtedly become the hotspot when multi transactions are processed 
concurrently. If the index keyword has to be inserted or deleted, then the corre- 
sponding index has to be locked, which will no doubt cause many conflicts and 
the performance of the whole DBMS will drop. Especially under the circum- 
stances of multi-processors, when several processors access the same database in 
parallel, the problem will be more obvious. 

To sum up, the drawbacks of conventional R-tree structure is as follows: 

1. The higher the position of the node is in the tree, the higher the probability 
of being visited. The root will be visited by all the transactions, so it will 
easily become the hotspot [5]. 

2. When a node splits (or merges) caused by inserting (deleting) another node, 
the nodes nearby have to be modified, too. In the worst case all the nodes 
from root to leaf have to be modified. 




A Design of Parallel R-tree on Clnster of Workstations 



123 



3. In any transaction, when a node is accessed, a share lock is added on it. 
When it is updated, an exclusive lock is added. The higher the position of 
a node is, the more nodes that will be locked, then the more conflicts it 
will cause. In the extreme case the root is locked, so is the whole tree. At 
that time all the other transactions can not visit the index, therefore the 
concurrency rate is the lowest. 

4. R-tree is effective in exact match query, but not in range query. As the leafs 
in R-tree can only be comprised by the nodes of the upper layer without 
intersecting with them, it will give rise to difficulty in partitioning the ranges. 
In some cases such partitioning is impossible because almost all the data 
objects are intersecting. 

We propose an upgraded index structure under COW which can avoid the emer- 
gence of visiting hotspots. The locked nodes are as few as possible in inserting or 
deleting key words, so the conflicts of inter-transactions and infra-transactions 
are also few. Moreover, the upgraded parallel R-tree can efficiently balance the 
workload and gain a better speedup. 



3 Upgraded Parallel R-tree 

3.1 Upgraded Parallel R-tree Structure 

The best way to overcome the shortcomings explained above is to split an R-tree 
into several sub-trees. It is shown is Figure 3. 
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Fig. 3. Upgraded R-tree Index Structure 
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Firstly a proper partition function is chosen to split the area to be indexed 
into several sub-areas. The partition function is determined by the user according 
to the space distribution of data objects in question. If the data objects in every 
sub-area are still large, then we can also divide the sub-area into several smaller 
areas (this process can be done repeatedly until the area is of a proper size). 
Then we can index the data objects in each sub-area using R-tree. After that, 
the whole index structure is composed of a certain number of sub-trees, data 
partitioning functions, a Primary Mapping Category and a Secondary Mapping 
Category determined by the partitioning functions. Every processor has an item 
in Primary Mapping Category. Each item is made up of two parts: an ID number 
of the corresponding processor and the number of the data objects stored on it. 
Each item in Second Mapping Category comprises two parts, too, which are a 
pointer to the sub-tree and the number of data objects stored in the sub-tree. 
Every sub-tree is an R-tree, but its leaf node has two items here: one is the ID of 
the processor where the data resides, the other is the position of this data on the 
processor. Under this index structure when a transaction operates on a sub-tree, 
it just adds lock on it, while the other sub-trees are intact and the operations 
can be done in parallel. Therefore the concurrency rate of the transactions is 
greatly improved and the visiting hotspot is avoided. In addition, as there is 
an item on the leaf node that records the home processor ID of the data, the 
data partitioning strategy will not influence the index structure. Thus this index 
structure is independent of the data partitioning method. 

The input of a partition function is the space occupied by a data object, 
while the output of the function is a set of IDs of the processors where the index 
of the data object resides. In Fig3, the input of the function PFK is the space 
occupied by a data object, while the output of the function is a set of roots of 
sub-trees where the index of the data object resides. 

In order to diminish the communication cost, every processor owns a complete 
R-Tree although only part of the database resides on each processor. 

3.2 The Implementation of the Parallel Operations on R-tree 

If we choose this parallel R-tree in indexing, we must partition the area of the 
R-tree into several sub-areas using beelines, curves or even curved surfaces in the 
case of multi-dimensional space. The sizes of the N sub-areas are not necessarily 
the same, however, in order to balance the workload on the PUs, the number 
of data objects in each sub-area should be evenly ditributed. There are one or 
several (if the sub-area is split again) R-trees in each sub-area, thus the number 
of data objects indexed by each tree is approximately the same. 

Algorithm of Parallel inserting and deleting. Inserting and deleting op- 
erations on this upgraded R-tree is similar to those on the conventional R-tree, 
except that the partition function and sub-partition function are used before 
operation to make sure on which tree(s) the indices of the operands reside. 
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The algorithm of parallel inserting is shown below: 
Parallel_Insert (Area RegionX) 

{ 

for all processors in the COW Do 

{ 

int k = Partition_Runction(RegionX) ; // See Figure 3 
Node rtreeRoot = PTV (RegionX) ; // See Figure 3 
Insert (rtreeroot , RegionX) ; //insert RegionX into the 
//R-Tree whose root is rtreeRoot; this insert 
//procedure is the same as described in [2] . 

} 

} 

The algorithm of parallel deleting is shown below: 
Parallel_Delete(Area RegionX) 

{ 

for all processors in the COW Do 

{ 

int k = Partition_Runction(RegionX) ; // See Figure 3 
Node rtreeRoot = PTV (RegionX) ; // See Figure 3 
Delete (rtreeRoot RegionX); //delete RegionX from the 
//R-Tree whose root is rtreeRoot; this delete 
//procedure is the same as described in [2] . 

} 

} 



Implementation of Parallel Searching. In order to describe the implemen- 
tation, a definition is given below. 

Definition 1. SMCk in processor K denotes the collection of the possible re- 
turn values of function PF^, i-e. part of the Secondary Mapping Category in 
Figure 3. 

The parallel searching algorithm is the extension of the algorithm in [1] . It 
is similar to the width-precedence searching method in R-tree. The parameters 
used in the algorithms are defined below. Region denotes the area to be searched. 
CW on each processor is a queue of infra nodes, whose sub-trees may comprise 
leaf nodes that point to the data to be found. At the beginning, CW in processor 
K is equal to SMCk- Threshold is a parameter that controls the beginning of 
parallel searching. When the node number in CW is equal to or greater than 
Threshold, parallel searching begins. The leaf nodes that are found satisfying 
the searching conditions i.e. ResultNodes are returned. 

The procedure of the parallel searching is shown in Figure 4. In the first step 
all the related PUs search the local R-trees while the others are kept idle. If the 
number of nodes in CW on a certain PU exceeds Threshold, then the second step, 
i.e. workload distribution, starts. After this step, the global searching begins on 
each PU in parallel. And this procedure repeat until all CW is empty. 
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All PUs« 

Local Search* 

CW oD each PU 
before distributing. 



Workload Distributing 

CW on each PU 
after distributing* 



Global Searching 
in Paratld. 

ResultNodes 

Fig. 4. The parallel R-tree searching procedure 

Algorithm of Local Searching. Below is the algorithm of local searching: 
Local_Search(Area Region , int Threshold, Queue CW) 

{ 

Set Resultnode = 0 ; 

Set pid = Partition_Function (Region) ; 
if (the ID of this processor belongs to pid) 

{ 

FirstNode = the first node in CW; 

for all the directly children node of the FirstNode do 

{ 

LDCK( the child node ); 

if the child node intersects with Region 
if the child node is not a leaf 

add the the child node to the end of CW; 
else ResultNode = ResultNode + the child node; 

UNLOCK ( the child node ); 

} 

CW = CW - FirstNode ; 

if (0<the number of nodes in CW<Threshold) 

Local_Search (Region, Threshold, CW,ResultNode) ; 
else if (CW is empty) 

send a message to other PUs to unform its idleness. 




} 
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else 

{ 

send a message to other PUs to inform its idleness 

} 

return ResultNode 

} 

Why not searching in parallel at the beginning but just searching on each 
local processor? Because we have to take into account the communication cost. 
At the beginning the R-tree is large, if it sends a mass of messages to other 
processors, a long communication time is inevitable. On the other hand, if the 
searching continues a step further on the local processor, the sub-tree left to be 
processed is just a part of the original one, and the other parts of the tree can 
be left out of consideration. More over, a layer the searching goes forward, the 
nodes in the sub-trees to be processed will decrease in geometric series. Therefore 
we should try our best to get rid of some sub-trees unrelated to the searching. 
However, we can not go too far in the R-tree on a single processor, because 
though the communication cost reduces, other processors will wait too long. So 
the parameter Threshold is used to adjust the process. The higher Threshold is, 
the less sub-trees to be processed, the lower the communication costs, the longer 
the other processors have to wait, and vice versa. The value of Threshold is also 
influenced by the searching area. If the searching area is small, the nodes to be 
queried are not many. If Threshold is large, while the number of nodes in CW 
reaches this point, maybe the searching approaches the leafs. There is no room 
left for parallel processing then. On the contrary, if the searching area is large, 
the Threshold may be reached near the root. As explained before, this situation 
will lead to poor performance, too. The value of Threshold should be set larger 
at this time. 



Algorithm of Load Distributing. We use the parameter CW produced in the 
above algorithm to allocate and balance the loads among processors. It works as 
follows: 

The nodes in CW are ordered according to the nodes number of their sub-trees. 
If there are some idle processors, in order to reduce the communication cost, the 
processor with the least nodes sends the position where a node in CW resides 
in the R-tree to an idle processor. This procedure repeats until there is no idle 
processor or only one node is left in the CW. After that, if the number of nodes 
in CW of the processor is larger than the number of idle processors, then the 
nodes left are processed on the local processor. Otherwise, if the number of nodes 
in CW is smaller than the number of idle nodes, then the position where the 
node in the CW of the processor with the second least data reside in the R-tree 
are sent to the still idle processor. If there are still idle processors left, then the 
process above goes on repeatedly. When an idle processor receives the message, 
it finds out the node according to the given position and set the CW to be this 
node. 




128 



S. Lai, F. Zhu, and Y. Sun 



Below is the algorithm of load allocation: 

Load_Distribute (int i, int NumberOf Idle ) 

{ 

P = the ID of the PU with the i-th least data objects; 
NumberOfNode=the number of nodes of 
CW in processor P 

if (0<NumberDfIdle< (NumberOf Node - 1)) 

{ 

while (NumbleOf Idle !=0 ) 

{ 

RNode = find the node located in the tail of queue of CW 
Positon=Get the position where RNode resides in the R-Tree 
The processor P Send Position to these idle processors 
with the method Round-Robin or other method . 

CW = CW - RNode; 

NumberOf Idle = NumberOf Idle - 1; 

} 

} 

else if ( NumberOfIdle >= NumberOfNode ) 

{ 

while (the number of nodes in CW > 1) 

// we leave the first node in CW for local processing. 

{ 

RNode = find the node located in the tail of queue of CW 
Positon=Get the position where RNode resides in the R-Tree 
The processor P Send Position to these idle processors 
with the method Round-Robin or other method . 

CW = CW - RNode; 

NumberOfIdle = NumberOfIdle - 1; 

} 

Load_Distribute (i+1, NumberOfIdle - (NumberOfNode -1)) ; 

} 



Algorithm of Parallel Searching. The algorithm of parallel searching is 
the interface of the R-tree search operation. It calls functions: Local-Search and 
Load-Distribute. Firstly, it calls Local-Search. Secondly, if there is some idle PUs, 
each processor invokes the LoadJDistribute function to ship some workload to 
idle PUs. Then Local-Search is executed on all PUs and the new value of CW on 
each PU is produced. When a processor has completed its work, it broadcasts 
a message to all the other processors to announce its idleness. When the other 
processors get the message, they will allocate the work loads again. The whole 
process runs circularly until the value of CW on each PU is empty. Then it return 
all the leaf nodes that are found satisfying the searching conditions. Below is the 
main algorithm of parallel searching: 
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Parallel_Search( Area Region, int Threshold, Queue CW) 

{ 

for all processors in the COW Do 

{ 

ResultNode=0 ; 

SubWorkload=0 ; 

For all nodes in CW Do 



} 



} 



{ 

firstNode = the first node in CW; 

ResultNode = ResultNode + 

Local_Search( Region, Threshold, subWorkload) ; 
CW = CW - firstNode ; 

CW = CW + subWorkload; 

} 

While ( CW NOT empty) Do 

{ 



} 



int NumberOfldle = the number of the idle processors 
if (NumberOf ldle>0) Load_Distribute (0 , NumberOfldle) ; 
ResultNode = ResultNode + 

Local_Search (Region, Threshold, CW) ; 



return ResultNode; 



3.3 Cost 

Compared with serial R-tree operations, there are extra costs, including the costs 
for data transfer and synchronization, in parallel operations. The total cost for 
parallel R-tree processing is[l]: 

Tcost = I jo + CPU + DataTransfer + Synchronization 

Data Transfer includes the time for load transfer and returning the results. 

Synchronization comprises the time for sub-queries, locking and synchroniza- 
tion (in transaction management). I/O and Synchronization costs are influenced 
by the system environment. As the time needed for CPU processing and data 
transfer can not overlap, we can see from the above equation that in order to 
make the total cost as small as possible, the CPU processing cost and the data 
transfer cost must be rational allocated. That is to say the original distribution 
of data and the value of Threshold must be carefully managed. 
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4 Experimental Results 

4.1 Experimental Environment 

BSP Model. In our implementation of a parallel DBMS, we use the BSP(Bulk 
Synchronous Parallelism) ([7,8]) model developed by University of Oxford. We 
also use it and the BSPLib [9] in the simulation experiments to test the perfor- 
mance of our upgraded parallel R-tree. BSP is a style of parallel programming 
developed for general-purpose parallelism, that is parallelism across all appli- 
cation areas and a wide range of architectures. Its fundamental properties are: 
1. It is simple to program. 2. It is independent of target architectures. 3. The 
performance of a program on a given architecture is predictable. 




Number of PUs 

a)DB size is 40000 records 




Number of PUs 

b) DB size is 20000 records 
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Fig. 5. The speedup with different database size 



Experimental platform. The experiments are done on 6 “Sun Sparc 5” Work- 
statations connected by lOM Hub. The OS is Solaris. Using Oxford BSPLib 4.2, 
we implement the parallel environment on COW. In order to test the performance 
of the upgraded parallel R-tree, we choose two SQL statements : 
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1. Select * from DBl where (DBl.Area intersect with a given area) 

2. Insert into DBl from (Select * from DB2) 

The two statements are executed in parallel. 



4.2 Experiment Results 

The standard measurement of efficiency of a parallel system is the speedup s, 
which is defined as follows: Let T{n) be the response time tested with n sites, 
then s = T(l)/T(n). 
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Fig. 6. the speed up with different mnlti intersecting rate 



The impact of database size to speedup is shown in Figure 6. The different 
curved lines in each figure denote the cases when the number of sub R-trees on 
each PU is 1, 4, 8 respectively. We can see from the figures that the number of 
the R-trees on each PU will influence the speedup. When the database increases 
from 10000 to 40000 records, the speedup becomes obviously better. For example, 
when the sub R-trees on each PU is 4 and the number of PUs used is 6, the 
speedup is 2.93 when the database size is 10000 records. When the database size 
doubles, the speedup becomes 28% higher; when the database grows to 40000 
records, the speedup is 35.5% higher. So our parallel R-tree is more suitable for 
large scale database processing. 

Figure 6 shows the impact of multi-intersecting rate. If a data object is 
intersecting with the multi sub-areas, then the index of the data object will 
be allocated to the R-trees corresponding to all these sub-areas. For instance, 
in Figure 7, data object R1 is both intersecting with sub-area I and II, so the 
index of it will be inserted into the R-trees of sub-area I and II. The multi- 
intersecting rate is defined by the percentage of the data object that intersects 
with multi sub-areas. As stated in Section 4, we split the area and distribute 
the data object to the PUs to gain parallelism and concurrency. However, if the 
multi-intersecting rate is too high, the size of sub-trees grows so rapidly that the 
workload on each PU will be too heavy to maintain the advantage of parallel 
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processing. In Figure 6, when the multi-intersecting rate is 25%, the advantage of 
parallel processing can be clearly seen as the speedup goes up when the number 
of processors increases. Nevertheless, when the multi-intersecting rate is 50%, the 
speedup lines are quite different. The effect of parallel processing still excels while 
the processor number is not more than 4, as we can see from (b) that speedup 
lines still climb. However, speedup lines drop when the processor number is more 
than 4, because the sub-trees on them are too many to be processed. 
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ni 
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Fig. 7. illustrate multi-intersecting 



5 Conclusion 

In this paper we introduce R-tree and its variants in parallel database. We ana- 
lyze the shortcomings of conventional R-trees and propose an upgraded parallel 
R-tree architecture. The parallel algorithms on it such as inserting, deleting, 
searching and load allocation are present in detail. We use this new R-tree in 
our implementation of a parallel object-relational DBMS. The experiments show 
that when the data to be processed are of large amount and data structures are 
complex, it performs much better than conventional R-tree. 
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Abstract. Interactive design of a n-ary tree data structure for F-rep 
(functionally represented) geometric models is discussed. The interactive 
Construction Tree tool which is a part of an F-rep modeler is designed 
to construct a tree structure and convert it to a HyperFun language de- 
scription. With the developed tool, a model in HyperFun can be also 
read with a corresponding graphical tree displayed. The user can inter- 
act with the 3D geometric model through the graphical tree. Using the 
interactive design of a construction tree, the user will be able to create 
HyperFun models more easily than before. The geometric data struc- 
tures will be incorporated into a design database for intended use on 
clusters of computer servers allowing low end clients access to advanced 
geometric modeling over the Internet. 



1 Introduction 

Geometric data structures depend very much on the underlying mathematical 
models and possible queries to them [1]. For example, a polygonal surface model 
is represented by lists of vertices, edges, and polygons with references to each 
other. While operating with a graphical user interface, geometric designers can 
seamlessly create corresponding data structures by saving the current shape. 
However, direct manipulation with the graphical representation of the geometric 
data structure can be helpful too. For, example an image and manipulation of 
the graphical tree is quite natural for Constructive Solid Geometry (CSG) based 
on the set-theoretic (Boolean) operations on primitive solids. 

In this work, we discuss a specific tree-like geometric data structure and a 
corresponding graphical interface, which supports direct manipulations on it in 
the distributed modeling environment. HyperFun project [2] deals with a lan- 
guage and software tools for the function representation (F-rep) [3] of geometric 
objects (described in Sect. 2.1) and is oriented to the development of an open 
system architecture for functionally based shape modeling and its applications. 
One of this project’s goals is creation of the server side F-rep modeler with an 
extendable graphical user interface (GUI) on the client side running under a 
Web browser. 

The main modeler should be installed on the server side. The client side is respon- 
sible for the interface for creating primitives (cylinder, box, torus, metaballs, con- 
volution surfaces and others), performing set operations (union, subtraction and 

S. Bhalla (Ed.): DNIS 2000, LNCS 1966, pp. 134-147, 2000. 
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intersection) , and modeling complex shapes using offsetting, blending, sweeping 
and other operations. Their mathematical description is sent to the server side 
to perform such time consuming tasks as polygonization (conversion to polyg- 
onal surfaces) or ray tracing for realistic rendering. The main benefit of this 
approach is to allow a user with a single less powerful PC to perform dynamic 
simulations created in a powerful cluster-computing environment. Shape models 
in the modeler are represented in a high-level modeling language called HyperFun 
[4]. HyperFun language is a simple modeling language and supports geometric 
primitives, operations, and relations in F-rep. A model in HyperFun can serve 
as a lightweight protocol for exchanging F-rep models between users, modeling 
systems, or networked computers. A complex geometric object in HyperFun can 
be read and written as a program in HyperFun and the user can do interactive 
work with geometric primitives, operations, and relations in F-rep. The F-rep 
modeler with GUI will be extendable. This is described in Sect. 2.4 in detail. 
The user of the modeler will be able to include new primitives and operations 
in F-rep. 

A Construction Tree GUI (see Sect. 2.7) is a part of the interactive client 
side of the F-rep modeler. Complex objects in HyperFun can be formed from 
simple objects by set operations and other unary, binary and even n-ary opera- 
tions. Therefore, objects can be represented by a tree data structure. In F-rep, 
a construction tree is used, which is an extension of the CSC tree (see Sect. 
2.2). The Construction Tree GUI can allow the user to graphically display this 
tree structure. Moreover, to efficiently store information about how objects are 
constructed, a Design Database will be used (described in Sect. 2.3). The user 
of the GUI can do interactive work with the graphical tree of a model and then 
output a corresponding HyperFun program. This is described in detail in Sect. 
2.5. Java Application Desktop (JAD) is used as a framework for creating GUI. 

2 Geometric Data Structures 

2.1 HyperFun Language 

HyperFun is a high-level programming language and a simple geometric modeling 
language. The user creates complex geometric models in a few source code lines. 
The model in HyperFun is processed by the modeling and visualization software 
tools. It is quite easy to learn and use the HyperFun language. 

A model in HyperFun can include several geometric objects. Each object is 
defined by a function parameterized by input arrays of point coordinates and 
free numerical parameters. The function is represented with the help of assign- 
ment statements, conditional selection (” if-then-else” ) , and iteration statements 
(’while-loop’). The functional expressions are built with using conventional arith- 
metic and relational operators, standard functions (’exp’, ’log’, ’sqrt’, ’sin’, etc.), 
built-in special geometric operations (” I ” - union, ” &” - intersection, ” \” - sub- 
traction, - negation, and - Gartesian product). 

HyperFun is intended for describing geometric objects in the form F(xl, x2, 
x3, ..., xn) \geq 0 (so-called F-rep). In F-rep, one can define a geometric object 
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by a single continuous function of several variables (point coordinates in multi- 
dimensional space). Therefore, F-rep is a more generalized model with respect to 
traditional skeletal implicit, convolution surfaces, distance-based models, CSG, 
sweeps, and voxel models, and can unify them. An F-rep library contains func- 
tions representing geometric primitives and transformations. The on-line manual 
for HyperFun language can be found at the devoted Web site [4]. 




2.2 CSG 

CSG [5] [6] stands for Constructive Solid Geometry. CSG is based on a set of 3D 
solid primitives and regularized set-theoretic operations. 3D solid primitives are 
traditional primitives : block, cylinder, cone, sphere, and torus. Operations in- 
clude union, intersection, difference, translation and rotation. CSG is a powerful 
tool for combining primitive objects for the creation of more complex objects. 
CSG objects can be extremely complex. They can be deeply nested. CSG objects 
are finite objects and thus respond to auto-bounding and can be transformed. 
A complex solid is represented with a binary tree usually called CSG tree. The 
data structure can represent the model as a directed graph. This graph is a 
hierarchical structure for 3D models shown in Fig.l. 

The construction tree of F-rep is similar to CSG tree, but has n-ary nodes 
and many more primitives and operations. 

2.3 Design Database 

Complex three-dimensional objects may be formed from simpler objects by ap- 
plying different operations. For example, three-dimensional surfaces may also be 
represented by wireframe models, which essentially model the surface as a set 
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of simpler objects, such as line segments, edges, and triangles. The geometric 
data structure carrying information about objects is stored in memory during 
editing or other processing. Long-time storage and retrieval of geometric objects, 
adding material and other non-geometric information, large number of objects 
and design history handling are the reasons for the design database necessity [7]. 



2.4 Extendable GUI for F-rep 

The main feature of an F-rep modeling system is its extensibility. The user can 
include new primitives, operations, and relations. F-rep models can be easily 
available to and processed in application software. The system should be ex- 
tendable on different levels including GUI. The user of a modeler with the GUI 
should be able to introduce new primitives and operations by off-line filling out 
specified template files. 

2.5 Role of a Construction Tree for an F-rep Modeler 

The F-rep library of HyperFun contains functions representing geometric prim- 
itives and transformations. The library in general includes the most common 
primitives (’Sphere’, ’Torus’, ’Ellipsoid’, ’Gylinder’, ’Blobby object’, ’Metaball 
object’, ’Gonvolution Surface’, etc. ) and transformations (’Blending union / in- 
tersection’, ’Rotation’, ’Scaling’, ’Twisting’, etc.). Each primitive has its own set 
of arguments. The users can create their own libraries of objects for later reuse. 

HyperFun language supports geometric operators relevant to F-rep that can 
be applied to any functional expressions treated as geometric objects. The geo- 
metric operators allow the user to construct complex solids by combining prim- 
itive shapes in different ways. For example, in the union operation, two shapes 
are added together. With the intersection operation, two shapes are combined 
to make a new shape that consists of the area common to both shapes. With the 
difference operation, an initial shape has all subsequent shapes subtracted from 
it. The following is a simplest example of a program in HyperFun (here, & is a 
symbol of the intersection operation). 

my_model { 



LI = hf Sphere! ) ; 

L2 = hf Sphere! ); 

my_model = LI & L2; 



} 



A Gonstruction Tree GUI for an F-rep modeler can allow the user to convert 
a HyperFun program into a graphical image of the model tree structure. In 
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this case, “LI = hfSphere( )” statement is represented as a leaf of the tree 

which holds variable name (LI), primitive type (hfSphere), and parameters ( ). 

“my_model = LI & L2” statement is represented as a binary node and a root node 
of the tree. The binary node which holds variable name (NULL) and operation 
(&) is connected with two leaves (LI, L2). The root node has variable name 
(my_model). As a result of conversion, the graphical image of the construction 
tree can be obtained (shown in Fig. 2). 




Fig. 2. Conversion from a HyperFun program to a tree 



Using this GUI the user can operate with nodes in the graphical tree by 
adding, connecting, moving, deleting, copying, or changing them. In Fig. 2 (right), 
new nodes named L3 and L4 are added. An operation (&) of the node with name 
NULL is changed to an operation (\). One node (L2) was separated and then 
connected again. The user can create new HyperFun programs with such fea- 
tures of this GUI. The output of the resulting HyperFun file is shown below: 

my_model { 



LI = hfSphere ( ) ; 

L2 = hfSphere ( ); 

L3 = hfEllipsoidC ); 

L4 = L3 I L2; 

my_model = LI \ L4; 



} 
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The users do not need to write texts in HyperFun and can create models 
simply with this GUI. This interactive tool allows the user to interact with and 
understand HyperFun operations used during the modeling process. The output 
of text files in HyperFun can be read by the various applications available for 
the HyperFun language. 



2.6 Using Java for Construction Tree GUI 

These applications can be built either with Java for Linux and Unix platforms, or 
the MFC for Windows platforms. Java provides the user a GUI running under 
the Web browser, independent of hardware resources. Our current prototype 
implementation uses Tcl/Tk for the interface part. 

2.7 Construction Tree GUI 

There are five kinds of nodes in this GUI such as root, leaf, unary operation, 
binary operation and n-ary operation. F-rep library primitives and user defined 
geometric primitives correspond to “Leaf” type nodes of the tree. Other nodes 
are geometric operators in F-rep library, or user defined HyperFun operations. 
For example, union (“I”) is considered as a form of the binary operator, set- 
theoretic negation is considered a unary operator. A tree needs a single 

root node. 

t 

Root unarynode binarynode narynode 

Fig. 3. Types of tree nodes 



Each node holds parameters of primitives and operators and variable name. 
Both variable name and primitive name (or operator name) are displayed near 
each node. Each object in HyperFun is represented as a separate n-ary tree that 
is built out of such nodes in the GUI. The GUI provides the following functions. 

The “Leaf” button can be clicked and then leaf nodes can be created by 
clicking on the canvas with the left mouse button. The “Root” button, “Single 
node” button, “Binary node” button, and “N-ary node” button operate in a 
similar way. When a node is created, a default name is used as the variable 
name. 

“Gonnect” button allows for connection of a node with other node by clicking 
with mouse button over a node or an edge, then moving the pointer over a 
different node. 
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“Move” button allows for moving nodes and edges. When left mouse button 
is pressed over nodes and the mouse is dragged with the button down, these 
nodes are moved. One can move the edges in the same way. 

“Delete” button allows for deleting nodes and edges. When the mouse is 
dragged with Shift-left button over a node, more than one nodes can be selected 
and deleted at the same time. When edges are clicked, one node can be cut off 
from other node. 

“Copy” button helps to copy nodes. Dragged with Shift-left button, more 
than one nodes can be copied. A default name is used for the copied node. An 
operator name or a primitive name are copied with the node. 

“Cut and paste” button supports cutting nodes once and pasting these nodes 
over and over again with saving the node’s information. 

“Change” button allows the user to change parameters of primitives and 
operators, and the variable name held in the node. 



2.8 Operation of GUI 

An example of the Construction Tree GUI operation can be shown now. With 
“Open file” button, the text file of the model in HyperFun (Fig. 4 left) can be read 
and then the construction tree structure is displayed on the screen (Fig. 5). The 
input HyperFun model is given in the Appendix. This model is accomplished by 
adding, deleting, moving nodes or changing data of nodes using our GUI. The 
edited model is shown in Fig. 4 (right), its tree structure is shown in Fig. 6. The 
text file of the model in HyperFun can be written with “Save file” button (see 
Appendix) . 




Fig. 4. Doraemon before and after making mustache 



3 Conclusions 

The Construction Tree GUI is a part of an F-rep modeler. It is useful for inter- 
active creation of a complex object and understanding the HyperFun model tree 
structure. The developed tool can input a HyperFun model and represent it as a 



Graphical Interface for Design of Geometric Data Structures 



141 




Fig. 5. Tree structure: Doraemon without mustache 
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graphical tree structure. It supports all necessary editing operations on the tree. 
The output of the GUI is a text file with the model in HyperFun. This text file 
will be useful for various applications developed for HyperFun language. 

For now, the developed GUI can treat a HyperFun model structure as a 
binary tree only. Gurrently, the user can not specify queries to the models using 
the graphical tree structure. The other open question is how to incorporate this 
data structure into a large scale database with long term storage and retrieval 
of geometric objects, addition of material and other non-geometric information, 
large number of objects and design history handling. 



Appendix A 

HyperFun model for Fig. 5 (left) with the structure shown in Fig. 6. 

my_model(x [3] , a[l]) 

{ 



array p [3] ; 

array headlCent [3] , hanalCent [3] , 
xf acelCent [3] ; 

array xf ace2Cent [3] , xkutilCent [3] ; 
array eyelCent[3], eye2Cent [3] , 
eye4Cent [3] ; 

array eyeGCent [3] , eyeSCent [3] ; 
array rhigelCent [3] , rhige2Cent [3] , 
rhige3Cent [3] , xlhigelCent [3] , 

xlhige2Cent [3] ,xlhige3Cent [3] ; 



p[l] 

p[2] 

p[3] 



= x[l] / 2.5 
= x[2] / 2.5 
= x[3] / 2.5 



xx=p[l] ; 

y=p [2] ; 

z=p [3] ; 



— head 

headlCent = [0, 4.5, 0]; 
head 

= hfEllipsoidCp, headlCent , 6, 5.5, 6); 



— nose 

hanalCent = [0, 6.5, 5.9]; 

— hana 



hana = 
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hf Ellipsoid (p,hanalCent, 0.8, 0.8, 0.8); 

— face 

xfacelCent = [0, 4. 5, 0.3]; 
xfacel = 

hfEllipsoidCp, xfacelCent, 6, 5.5, 6); 

xface2Cent = [0, 3.7, 0]; 
xface2 = 

hfEllCylZ(p,xface2Cent , 4.7, 4.5); 
atamal = xfacel & xface2; 

— mouth 

xkutilCent = [0, 4.5, 6]; 
xkutil = 

hfEllipsoidCp, xkutilCent, 4, 4, 4); 

xkuti2 = 4.5-y; 

xkuti = xkutil & xkuti2; 



— eyes 

eyelCent = [0, 4.5, 0.4]; 
eyel = 

hfEllipsoidCp, eyelCent, 6, 5.5, 6); 
eye2Cent = [1.5, 7.5, 0]; 
eye2 = 

hfEllCylZCp,eye2Cent, 1.5, 1.4); 
eye3 = eyel & eye2; 

eye4Cent = [-1.5, 7.5, 0]; 
eye4 = 

hfEllCylZCp,eye4Cent, 1.5, 1.4); 
eye5= eyel & eye4; 

eye6Cent = [0.3, 7. 5, 5. 5]; 
eye6 = 

hfEllipsoidCp, eye6Cent, 0.3, 0.3, 0.3); 
eye7 = eye3 & C-eye6) ; 

eye8Cent = [-0.3, 7. 5, 5. 5]; 
eye8 = 

hfEllipsoidCp, eye8Cent, 0.3, 0.3, 0.3); 
eye9 = eye5 & C-eye8) ; 




144 



T. Hibi and A. Pasko 



— head final 
LI = eye9 I eye7 ; 

L2 = LI I atamal; 

L3 = L2 I head; 

L4 = L3 I hana; 

my_model = L4 \ xkuti; 

} 



HyperFun model for Fig. 5 (right) 
with the structure shown in Fig. 7. 

my_model(x [3] , a[l]) 

{ 

array p [3] ; 

array headlCent [3] , hanalCent [3] , 
xf acelCent [3] ; 

array xf ace2Cent [3] , xkutilCent [3] ; 
array eyelCent[3], eye2Cent [3] , 
eye4Cent [3] ; 

array eyeGCent [3] , eyeSCent [3] ; 
array rhigelCent [3] , rhige2Cent [3] , 
rhige3Cent [3] , xlhigelCent [3] , 
xlhige2Cent [3] , xlhige3Cent [3] ; 

p[l] = x[l] / 2.5; 

p[2] = x[2] / 2.5; 

p[3] = x[3] / 2.5; 

xx=p [1] ; 

y=P [2] ; 
z=p [3] ; 

— head 

headlCent = [0, 4.5, 0]; 
head 

= hfEllipsoidCp, headlCent , 6, 5.5, 6); 
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— nose 

hanalCent = [0, 6.5, 5.9]; 

— hcoia 
hana = 

hfEllipsoidCp , hanalCent , 0.8, 0.8, 0.8); 

— face 

xfacelCent = [0, 4. 5, 0.3]; 
xfacel = 

hfEllipsoidCp, xfacelCent, 6, 5.5, 6); 

xface2Cent = [0, 3.7, 0]; 
xface2 = 

hfEllCylZ(p,xface2Cent , 4.7, 4.5); 
atamal = xfacel & xface2; 

— mouth 

xkutilCent = [0, 4.5, 6]; 
xkutil = 

hfEllipsoidCp, xkutilCent, 4, 4, 4); 

xkuti2 = 4.5-y; 

xkuti = xkutil & xkuti2; 



— eyes 

eyelCent = [0, 4.5, 0.4]; 
eyel = 

hfEllipsoidCp, eyelCent, 6, 5.5, 6); 
eye2Cent = [1.5, 7.5, 0]; 
eye2 = 

hfEllCylZCp,eye2Cent, 1.5, 1.4); 
eye3 = eyel & eye2; 

eye4Cent = [-1.5, 7.5, 0]; 
eye4 = 

hfEllCylZCp,eye4Cent, 1.5, 1.4); 
eye5= eyel & eye4; 

eyeSCent = [0.3, 7. 5, 5. 5]; 
eye6 = 

hfEllipsoidCp, eye6Cent, 0.3, 0.3, 0.3); 
eye7 = eye3 & C-eye6) ; 



eye8Cent = [-0.3, 7. 5, 5. 5]; 
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eye8 = 

hfEllipsoid(p,eye8Cent, 0.3, 0.3, 0.3); 
eye9 = eye5 & (-eye8) ; 

— right moustache 
rh=0 . 2 ; 

rhigelCent = [2, -3, 4.5]; 
rhigel = 

hfTorusZ(p,rhigelCent , 10, rh) ; 

rhige2Cent = [2, -3. 5, 4. 5]; 

rhige2 = hfTorusZ(p,rhige2Cent , 10, rh) ; 

rhige3Cent = [2, -4, 4.5]; 
rhige3 = 

hfTorusZ(p,rhige3Cent , 10, rh) ; 

— left moustache 
xlhigelCent = [-2, -3,4.5]; 
xlhigel = 

hfTorusZ(p,xlhigelCent , 10, rh) ; 

xlhige2Cent = [-2, -3. 5, 4. 5]; 
xlhige2 = 

hfTorusZ(p,xlhige2Cent , 10, rh) ; 

xlhige3Cent = [-2, -4,4.5]; 
xlhige3 = 

hfTorusZ(p,xlhige3Cent , 10, rh) ; 
rhani = (xx-0) & (7-xx) ; 
rhige4 = rhigel I rhige2; 
rhige5 = rhige4 I rhige3; 
rhigeG = rhige5 & rhani ; 
rhigeZ = rhigeG & y; 
xlhani = (xx+7) & (0-xx) ; 
xlhige4 = xlhigel I xlhige2; 
xlhige5 = xlhige4 I xlhige3; 
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xlhigeS = xlhigeS & xlhani ; 

xlhige? = xlhigeS & y; 

hige = xlhige7 I rhige7 ; 

— head final 
LI = eye9 I eye7 ; 

L2 = LI I atamal; 

L3 = L2 I head; 

L4 = L3 I hana; 

L5 = L4 \ xkuti; 

my_model = L5 I hige; 

} 
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Abstract. This paper presents a formal framework for queries based 
on similarity in fuzzy information retrieval systems. It is well known 
that a mechanism for improving the retrieval recall is the use of broader 
terms or specific terms to the terms that appears in the original query. 
The devices of expansion of queries use fuzzy thesauri and the notion of 
hierarchies. The formal framework presents three components: T, the set 
of all the terms in the system, F, the set of transformation rules applied 
on T, L, the flexible query language. This framework is based on the 
fuzzy set theory. 



1 Introduction 

A document information retrieval (IR) system provides references to documents 
based on two major components: the document representation and the user 
query. The document representation is typically based on terms, which are the 
atomic components of documents. The user query is expressed using a query 
language that is based on these terms and allows combination of user require- 
ments with logical operators AND, OR, and NOT [7]. In Boolean IR systems 
a term in a document representation is either significant (occurs at least once 
in the document) or insignificant (i.e., does not occur at all in the document). 
Likewise, terms specified in the user query are completely relevant to the user’s 
information needs. Information retrieval, however, is characterized by impreci- 
sion [2]. First, there is imprecision in the document representation. The Boolean 
IR model cannot take into consideration a partial degree of significance. In a doc- 
ument one term might be highly more significant than another term, yet there 
is no way to distinguish between the two of them. Even with methods that sum- 
marize documents based on their contents, an assumption is that the document 
representation is only partial and inexact. Another source is the user’s vague 
knowledge of the subject area for which information is being requested. In addi- 
tion, knowledgeable users would like the ability to express their understanding 
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of the importance or relevance of terms in the desired documents. Evaluation 
of a document with respect to a query is also an imprecise process. The se- 
lection method used to retrieve the represented documents, which are relevant 
to the user’s query, requires a procedure to handle decision under uncertainty. 
With all these sources of imprecision, fuzzy set theory provides an useful tool 
for information retrieval. 

It is customary to evaluate the effectiveness of a retrieval system by using 
a pair of measures, known as recall and precision respectively [8]. Recall is the 
proportion of relevant material actually retrieved from the file, and precision 
is the proportion of the retrieved material, which is found to be relevant to the 
information needs of the users. In principle, a search should achieve high recall by 
retrieving the most relevant items, maintaining at the same time a high precision 
by rejecting a large proportion of the extraneous items. In that case, both the 
recall and the precision values of the search are close to 1 (or 100 percent). In 
the practice, it is known that the recall and precision tend to vary inversely, 
and that is difficult to retrieve everything that is wanted, while also rejecting 
everything that is unwanted. 

In automatic retrieval systems query formulations and document representa- 
tions can be altered in attempting to reach desired recall and precision levels. In 
particular the use of recall-enhancing devices will broad the document and query 
identifiers in the hope of achieving a higher recall performance. Analogously, the 
precision-enhancing devices are designed to render the item identifications more 
specific in the expectation of obtaining better precision [18]. See Table 1. 

A system of ideal IR should provide mechanisms to express, besides specific 
requirements, a wide group of relative terms to the requirement. In this way 
they can improve the recall levels and precision [8]. The use of weight in the 
formulation of queries is a measure that tends to improve the precision. The 
specification of flexible queries goes exactly in this sense [1]. 

This work concentrates on a mechanism that allows improving the recall 
when extending the original queries to approximate queries. This means that for 
every term in the original query, a set of rules is applied to obtain other terms 
similar to the original one. 

This approach is very important for user with high recall necessities. For 
example, many lawyers are high recall users. That is, in order to know how a 
particular legal case needs to be approached, it is often important to examine 
all possible previous cases that may be similar in some sense to the current one. 
There are other cases where the users have a vague idea of what to look for, either 
because they do not know the structure of the information (or which information 
is available) or because they are uncertain on the required information (or how 
they can denote it in acceptable terms by the system). 

2 Related Works 

The actual systems, which make searches on documents, work on criteria based 
on: document structure related with syntactic similarity [4]; lexicographic sim- 
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Table 1. Typical recall and precision enhancing 



Recall Enhancing Devices 
(Term broadering) 


Precision Enhancing Devices 
(Term narrowing) 


Term truncation (suffix removal) 


Term weighting 


Addition of synonymous. 


Addition of term phrases 


Addition of related terms. 


Use of term co-occurrences in documents 
and sentences. 


Addition of broader terms (using term hi- 


Addition of narrow terms (using term hi- 


erarchy) . 


erarchy) . 



ilarity of words that appear in certain parts of the text, cradle in stemming 
(truncating of suffixes) and sonant (phonetic similarity) [15]; searches on parts 
of the document in a rigid way [15], [2]. 

PAT [19] is a system for free search of structured text developed by the 
University of Waterloo to be used with the dictionary of Oxford. In this system 
the text does not assume a structure based on a scheme but on tags or marks 
that delimit the structure of the document, indexed as if they were words. They 
provide a variety of operators, resulting from the query a set of points from 
matching or a set of regions. This is a better approach to the previous one 
for search on structure presented in [2], since it provides more flexibility in the 
indexing with the structure of the document. 

A general proposal on approximate searches is done in [5], who proposes a 
domain-independent framework for defining notions of similarity and rules of 
associated transformation. It defines that an object A can be approximated to 
another object B, after certain number of transformations. Another proposal of 
similarity or search based on association is the one presented in [8], it consid- 
ers hierarchic proximity, without taking in account the proximity that can be 
established by means of the semantic relations. Other criteria of proximity are 
based on the existing connections in documents hypertexts [10] and result from 
a querying to another one by incompleteness in the process [20] . 

Another approach on the paradigm of matching is the development of repre- 
sentation space of the queries and the documents that allow calculated measures 
of similarity among them, so that, the documents are ordered according to its 
similarity or relevance to the query. The advantage that they indicate on this 
method is the applicability of sophisticated methods of feedback relevance [11]. 

Other family systems are using fuzzy logic for modeling or retrieval informa- 
tion from fuzzy sets. Some authors propose a description of document with set 
of keywords (often represented like a vector), where the dimension of each vector 
is the number of possible keywords. It allows the establishment of a distance be- 
tween the document description (usually with some metric vector) and retrieved 
information in the nearness of the requirement vector [6]. The use of fuzzy the- 
saurus that takes into account the proximity from the synonymous among the 
keywords has also been widely accepted [12], [13]. 
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3 The Model 

We define a formal framework for the resulting queries of this process. These 
queries are known as queries based on similarity. 

Definition 1. A framework for queries based on similarity contemplates a tern 
called as defined Q~ by: 



Q^ = {T,r,L) ( 1 ) 

Where T is the set of all terms that appears in the system. 
r is the set of extension or transformation rules applied on T. 

L is the query language. 



3.1 The Set of Terms 

An element ft S T is a text expression, chain of characters or pattern which 
possesses, from the linguistic point of view, an own meaning. The describer set 
of terms W, commonly used in the representation of documents, is part of this 
set of terms, that is, W GT. 

It is interesting to study the connections that can be settled among the 
terms. There is a considerable evidence to believe that a quantitative study 
of written texts can be used, at least partly, to determine the content of the 
information, and that such an analysis can likewise, produce groups of entities 
of the type usually used in the thesauri (set of words or sentences inside certain 
category of matter called class of concepts) [16]. Many IR systems use thesaurus 
or dictionaries for documents representation and to modify queries that improve 
the chance of finding a good quantity of relevant documents. Two devices of 
broading terms that will be used in this work are the semantic classes (expressed 
through thesaurus) and the taxonomy or hierarchies. 

Semantic Classes. A principal method of expansion of terms is thesauri, to 
provide synonymous and related terms that specify semantic classes [17], [16]. 
Each semantic class should contain terms that are semantically related and to 
be maximal, in the sense that if two terms are semantically related, then they 
should belong to the same class. It is necessary to mention that these semantic 
classes come expressed through terms with a gradual membership. 

Different mechanisms have been proposed for the automatic construction of 
thesaurus; the interested reader can go to [16], [12], [14], [?], among others. 
Others allow the human intervention for their construction like [17], [7]. In the 
practice, many thesaurus are mainly built manually in two ways. 

1) To gather words that deal on the same topic. 

2) To gather words that deal on related things. 

The first thesauri type connects words that are inter substitutable, that is, 
placed in equivalence classes. Thus, a word can be chosen to represent each class 
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and a list of these words could be used to form a controlled vocabulary. The 
second type uses semantic connections among the words, for example, hierarchy 
relationships. 

In the present study, a fuzzy thesaurus will be used to define the association 
among the general and specific terms in specifying them as fuzzy sets. This 
way, a general term can be expressed by a fuzzy set on the specific terms. The 
membership degree of a specific term in the fuzzy set, represents for a general 
term the relevance degree from this specific term to the general term. An initial 
intent of using fuzzy thesauri is presented in [12], [13]. 

Definition 2. A fuzzy thesauri is a set of general terms as: 



Th = {tg \tgeT} (2) 

Where each tg is defined by means of the fuzzy set like: 

tg = {{h-tgite) / te) \ te G T} (3) 

and gtg{te) G [0, 1] denotes the relevance degree from the specific term to the 
general term. 



Hierarchies. A hierarchical arrangement of the terms included in the thesauri 
expresses additional concepts when relating them to find their parents, children, 
siblings and any possible set of crossed references. The importance of this hierar- 
chical visualization of the information allows to creating different representations 
of knowledge through different levels of abstraction. 

One of the fundamental deficiencies of the data manipulation languages is 
that they assume a completely plane organization of the information. However, 
the classification hierarchies (taxonomies) have been the traditional metaphor 
of information organization and the search languages should take advantage 
of them. A possible solution consists on adapting the taxonomies of keywords, 
traditionally used in the systems of bibliographical references, like a general 
metaphor of the information organization. 

But, it is not enough to use taxonomies to classify the information. Indeed, 
one of the aspects that differentiates the technology of searches for keywords 
of the manipulation languages of the databases is the possibility to reason in 
function of this classification taxonomies. For example, if somebody introduces 
the following expression boolean in a system of bibliographical searches: (heavy 
metal and not gold) or pollution. It is indicating that all the required papers 
have to speak on pollution or of some heavy metal that is not the gold. If there 
is an article in the system that deals on mercury, the system will return it as an 
answer, since the mercury is a heavy metal. 

The simplest systems in the information organization are based on a hier- 
archical disposition of the elements that conforms it. In particular, the most 
popular operative systems use this paradigm for organizing the information in 
arborescence directory. (See Fig. 1). Nevertheless, this focus has as fundamental 
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Venezuela Colombia 
Fig. 1. World Hierarchy 



limitation, which is that the data classification is subject to a single approach. If 
the document makes reference to several topics, only one of them will be consid- 
ered at the moment to decide in which directory to place the file. Therefore, there 
will be aspects of the file that are not reflected for the position that occupies in 
the arborescence. 

This abstraction will be defined subsequently along with the transformation 
rules. 

3.2 Transformation Rnles 

Definition 3. Let O be a set of object, a transformation rule define a new set 
O’ that contains objects close to the objects in O. 

In this context it is spoken of two types of rules: atomic and complex. 

Atomics. An atomic rule applied to a term ti', where ti and ti' € T and ti « ti' 
(« indicates similar to). 

Definition 4. Formally, an atomic transformation rule has the following form: 
rl: Similarity (A, A, 1). 

r2: Similarity (A, B, ) ^ relation (A, B, ). (4) 

r3: Similarity (A, B, ) ^ relation (B, A, ). 

The semantics of these rules is like this: Let ti ^ T he some term. The set 
of associate terms ti' are those for which exists a relation predicate {ti,ti',^) or 
relation predicate (ti',ti,^) where each ti' G T and ti « ti'. 

Intuitively, we have a relation of similarity between two objects A and B, if 
they are in the relation and m indicates the degree in which they are related. In 

a trivial way, in the first case each element is similar to itself in a total degree 

of satisfaction. This rule fulfills reflexivity and symmetry properties of similarity 
relations [21]. 
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Complexes. A complex rule specifies a sequence of atomic transformations to 
be applied on the input term. 

Definition 5. Formally a complex transformation rule has the following form: 



r4: Hierarchy (A, B, p.)^ Similarity (A, B, p). 
r5: Hierarchy (A, B, p)^ Similarity (A, C, pi), 

Hierarchy (C, B, P 2 ), 
p is min( pi, P 2 ). 

The semantic of these rules is like follows: Let ti G T he some term. The 
set of associated terms ti' are those that when applied to them a sequence of 
atomic transformations exists a predicate Similarity (ti,ti',p) where each ti' 
G T and ti « ti'or it exists ti" such that is verified the similarity predicate 
{ti,ti",p) and similarity predicate {ti',ti",p) and so on. Intuitively, a hierarchy 
relation is established among the elements. This mechanism allows the special- 
ization of hierarchical levels among different terms. This approach is used by 
the cooperative queries systems [3], where it is possible the representation of 
multilevel objects with a certain semantic distance among them. 



4 The Query Language 

Definition 6 (4). defines the language L of the IR system as an alphabet con- 
sisting of the following symbols: 

a. A set T of basic terms. 

b. Logical connectors: A,V and 

c. Parenthesis: (). 

d. The set T* of complex terms, defined recursively by the following rules: 

d.l. y tGT -AgT*. 

d.2. y t G T* : ^t G T* . 

d.3. V ti, t2 G r* : ti A ta G T* . 

d.4. V ti, fa G T* : ti V ta G T* . 

In the context of this work, it is necessary to consider some additional 
rules: 

d.5. y ti, fa G T, such that ti « fa : ^2 G T*. 

d.6. The elements of T* are only formed by applying the rules d.l - 

d.5. 



e. A subset of the real numbers between 0 and 1. 

f. Slash: /. 
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5 Example 

A small example of the approximate execution of the query ’’Give me all the 
documents related with Venezuela and the OPEC with a relevance level of 0.8 
and 0.9, respectively”, it is like: 

0.8 (Venezuela) and 0.9(OPEC) . 

Then, the relationships among Venezuela, Colombia and South America; and 
between the OPEC and economy are recognized due to the fuzzy thesauri and 
hierarchies. It is necessary to point out that in the case of the specific queries 
[1], the terms Venezuela and OPEC belong to the user’s original query with a 
membership degree equal to 1. In the case of the extended queries, this degree 
varies depending on the degree of existent similarity among the different terms. 
For example, between the terms Venezuela and Colombia there exist semantic 
relationships, which means that they have a proximity value. In the case of South 
America, this term has a hierarchical relationship with Venezuela since it is a 
South American country. Also, between the OPEC and Economy there exist 
semantic relationships. Therefore, we have two interpretations for the member- 
ship degree: 1) The degree of importance or preference of the term for the user. 
The user explicitly defines this degree, since he is the only person who knows 
the importance of the terms and he can indicate the rank among them. 2) The 
membership degree of the term with respect to the query. This is implicitly de- 
fined by the implementation and its value depends on the relationship degree 
among the terms. 

0.8(1 /Venezuela) and O.9(l/OPEC) 

0.8(1 /Venezuela) and 0.9(0. 8/economic) 

0.8 (0.9 /Colombia) and O.9(l/OPEC) 

0.8 (0.9/ Colombia) and 0.9 (0.8 /economic) 

0.8(0.6/suramerica) and O.9(l/OPEC) 

0.8(0.6/suramerica) and 0.9(0. 8/economic). 

After that, the resultant simple queries are processed according to the con- 
ditions specified in each query and the retrieval status value is calculated. The 
result of this evaluation process is a ranked document set in a decreasing form 
according to the status retrieval value. All of these documents have to satisfy a 
minimum level required by the user. 

6 Conclusions 

We have defined a formal framework for approximate queries based on the idea of 
extended queries through fuzzy thesaurus and hierarchies like a way to improve 
the retrieval recall in a documental information retrieval systems environment. 
This framework has three components: T, the set of all terms that appears in 
the system, T, the set of transformation rules applied on T, and L, the query 
language. Finally, we presented a small example of queries approximated to the 
initial one like a form to improve the retrieval recall. 
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Abstract. Researchers in the field of Database Systems are bound to 
expand database techniques beyond traditional areas, particularly to the 
realm of the web. With the growing importance of information technol- 
ogy in today’s industry, new types of applications are appearing that 
require a broader understanding of information. Recently, many efforts 
have been made to understand and model the subjective factors that 
play an important role in our social life and communication so that they 
could be embedded into new information technologies. We propose here 
our framework for endowing a software agent with the capability to per- 
sonalize itself to its user through interaction and perform tasks that in- 
volve subjective parameters. We describe K-DIME, a software prototype 
that can retrieve material from the Web on the basis of both objective 
and subjective features of the content. K-DIME allows users to create 
their own Kansei User Model. With its ability to bootstrap a new user 
model from the model of a user with similar profile, it significantly re- 
duces the workload generally associated with an online learning phase. 
Continuous adaptation driven by specihc patterns of interaction with the 
user enables K-DIME to cope with the intrinsic variability of subjective 
impressions. A working prototype has been implemented and applied in 
a scenario in which users are asked to identify pictures they would like 
to display on a greeting card. 



1 Introduction 

Researchers in the field of Database Systems are bound to expand database 
techniques beyond traditional areas, particularly to the realm of the web. With 
the growing importance of information technology in today’s industry new field 
of application are arising that require a broader definition of information (e.g. 
including subjective parameters for applications in design industry, advertise- 
ment, entertainment or agent-mediated internet-shopping). 
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Recently many efforts have been made to understand and model the subjective 
factors (Kansei in Japanese) that play an important role in our life so that they 
could be embedded into new information technologies (entertainment, design, 
etc.). In information management (music, images, etc.) systems dealing with 
subjective requests [1] [2] [3] [4] [5] [6], the computer is generally not embodied 
(i.e. embedded in a mechanical structure with sensors and degrees of freedom 
[7] [8]) and consequently the modality of interaction between the user and the 
system is very limited. The issue has been addressed mainly by using psycho- 
logical profiles or other types of user models. In those works the user model is 
embedded in the system, such that it is difficult for the user to have access to 
it, neither functionally nor structurally. 

Not knowing the structure of the user model, the user cannot identify the el- 
ements of his/her request that led to an unsatisfactory answer by the system. 
Even when the user feedback is taken into account by the system, the user can- 
not monitor the progress of the user model. Moreover user’s subjectivity is not 
static but rather is (a) characterized by a continuous dynamical process and (b) 
difficult to externalize [9] [10]. In this paper, we suggest to draw inspiration from 
human-human interaction. Mutual understanding over emotional responses can 
be reached through social interaction, i.e. an additional modality of interaction. 
We propose here our framework, K-DIME, for endowing a software agent with 
the capability to personalize itself to its user through interaction and retrieve 
material from the Internet on the basis of both objective and subjective features 
of the content. 



2 State of the Art: Retrieving Images from the Web 

Commercial and search engines have been developed to retrieve images from the 
web. The retrieval is achieved on the basis of explicit low-level features of the 
images, keywords indicating the possible content of the images or by examples, 
using statistical methods to compute similarity among images or sketches using 
pattern recognition techniques. In order to refine the search, some systems uti- 
lize user feedback such as selecting or discarding images, or manipulating the 
low-level features of the images. 

Webseek [11] is a content-based visual query system designed to search images 
and videos from the web on the basis of the low-level features of the images, i.e. 
histogram, texture, and associated text. Users query images and videos through 
pre-defined subject categories, entering text and using visual content tools to 
establish the low-level features to be matched. 

ImageRover [12] is also a content-based visual search engine that supports com- 
plex feedback such as the user producing a set of relevant images. ImageRover 
utilizes such information to construct an integrated vector of features and refine 
the query. 

Meta web-servers for image retrieval have also been proposed which aim at au- 
tonomously selecting a suitable set of search engines for answering queries, e.g. 
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Meta-Seek [13]. According to the query, the selected search engines are tuned 
and the results integrated before presentation to the user. 

3 State of the Art: Retrieving Images 
by Subjective Parameters 

Image retrieval using subjective parameters started to attract attention in the 
recent years. Shibata et al. [2] presented a system to retrieve “street-landscape” 
images using subjective models. The model is based on the relationship between 
adjectives and visual low-level features. The underlying idea in this work is “in- 
dependence”: 1) independence of visual low-level features to be considered in 
the construction of the kansei model, 2) independence of the physical param- 
eters involved in process by which Kansei (or subjectivity) discriminates the 
environment. The parameters us are color, direction, movement and position. 
Through different experiments, he attempted to demonstrate the role played by 
the independence of parameters in the relations with impression words. 

ART museum [3] is a system that is capable of retrieving impressionist paintings 
from a local database on the basis of a kansei model. The model of the user 
is refined by user feedback (relevant, not relevant). Impression words relate to 
color features of the images after Principal Component Analysis. 

Isomoto et al. [4] proposed a structure of keywords and key-images as attributes 
of paintings for fuzzy theory-based information retrieval. Each attribute of a 
painting is described as a fuzzy set and a membership function that represents 
the grade of the keywords or key-images (colour, shapes). The paintings are 
stored along with a bibliographical description (painter, date, etc.) and a content 
description (colour, position of the objects) and the impression words conveyed 
by the painting. Synonyms and antonyms are formalised as fuzzy relations in 
the retrieving process. 

Hattori et al. [5] is a system for “interactive and creative art appreciation edu- 
cation” . Students can query a previously indexed painting database through the 
WEB and compare their kansei judgement with the one of other students’ and 
with the painter kansei criteria. This process leads the students to externalise 
(or be aware of) their “subjective mind process”. 

Finally, Imai et al. [6] present a colour co-ordination system evaluated on eye 
make-up. The system incrementally learns, using neural networks, new satisfac- 
tory colour co-ordination according to user feedback. 



4 KDIME 

K-DIME, or Kansei Dime [14], is a software environment that enable users to 
retrieve images from the Web on the basis of textual keywords and to filter the 
results according to a kansei model built on relations between impression words 
and image low-level features. K-DIME has been extended to allow the user to 
personalise the user model. 




160 



N. Bianchi-Berthouze and T. Kato 



Users connect to K-DIME using a Web browser, and specify their requirements 
in terms of both the objective and subjective properties of the images. K-DIME 
relies on three essential components (Figure 1): (a) active agents called Oracles 
that model users’ subjectivity, (b) an Image Filter that retrieves and filters 
information from the Web and (c) MIKE [15] that supervises the personalization 
of Oracles following interactive sessions with the users. 




Fig. 1. K-DIME architecture 



4.1 Oracle 

An Oracle is a computational agent able to interact and dialog with users over 
subjective concepts. Through interaction, Oracles and users co-evolve a multime- 
dia language. The attributes of an Oracle are a user profile, an image-processing 
kernel, a learning kernel and an action evaluator. The user profile consists of a 
collection of data relative to the user that trained the Oracle, e.g. nationality, 
gender, age, job, hobbies, a list of known words and the relations among them 
(opposite, synonym, nuance, etc.). The image-processing kernel integrates sev- 
eral image analysis tools (color, shape, texture) to construct a signature of each 
image which is then fed to the learning kernel. 

The signature of an image is constructed from its information of color, texture 
and shape. More specifically, images are segmented in the HSB (Hue, Saturation, 
Brightness) space, as shown in Figure 2. The plane SB is divided into 6 regions 
and 10 hues are considered. From this segmentation, two vectors are built, one 
taking into account only the SB segmentation and the other one considering SB 
and Hues. An example of segmentation is shown in Figure 2b. 

The segmented images are used to extract region-shapes [16] from the original 
images (Figure 2c). Important regions (whose area is above a given threshold) 
are described in terms of: area, position, circularity, homogeneity and direction. 
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Fig. 2. Image processing: a) the original image, b) the segmented image, c) extracted 
regions an shape measure compuation, d) the neural networks fed by the signatures 
produced by the image processing. 



The original images are also divided into 5 areas and the texture parameters [17] 
computed. The above process produces 4 vectors (HSB, Tone, Shape, Texture) 
that form the signature of the images for a total of 154 parameters. 

The core idea behind our definition of Oracles is that the “meaning” of im- 
pression words used by the users are grounded into the low-level characteristics 
of the images (or whichever media) at the origin of that impression. Hence the 
learning kernel is set to learn associations between the output of the image pro- 
cessing kernel and impression words under the user’s supervision. The learning 
kernel consists of a set of learning modules, each learning module being dedicated 
to a single impression word. As the language evolves, new modules are added 
to the learning kernel. A module is a set of neural networks (hsb, tone, shape, 
texture) ( 2d), trained by back-propagation with momentum and learning the 
relations between low-level features in input and the saliency of the impression 
word (corresponding to the module) by respect to the input, as evaluated by the 
user. The learning of the modules (which correspond to a personalizing process 
of the Oracle) takes place during interactive sessions in which the system aims 
at identifying inconsistencies/incompleteness in users’ definitions of impression 
words and the users provide new examples to solve potential deadlocks. 
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The action evaluator elaborates on the user feedback during the interactive 
session and extracts important facts that can directly affect the learning module 
by changing the topological structure of the neural networks or the training set 
and by creating or updating relations with other words. 



4.2 Image Filter 

The Image Filter let users browse the Web and retrieve images on the basis 
of subjective and objective keywords. Communication of the Image Filter with 
the Web is mediated by Dime, the Distributed Information Manipulation Envi- 
ronment. Users describe the images to be retrieved using Web forms, which are 
submitted to the system through HTTP. The objective parameters in users’ re- 
quests (i.e. images of Kyoto) are used to query existing search engines/databases, 
such as Alta-Vista, Lycos or Yahoo [18] [19] [20]. The Image Filter analyses and 
integrates the results of the query, selects and configures Oracles to assess them 
against the subjective criteria specified by the user (i.e. romantic images of Ky- 
oto). 

Figure 3 depicts the query page on the web browser for the image filters. The 
user (1) enters a set of objective keywords, (2) selects one or more entries from 



Kan5«-D I .M E Query 

Hi paul. 

This form alhm/s yw lo lubmit • query to K - DIME, an application of DIME (Distributed 
Information Manipulation Environment) for retrieving and filtaing images based on «b)ective 
(Kanaci) properties. It was developed m a demonstration for the thiman Media Project. 

Pill in the form beiow to acarch the internet for imagea that meet the specified criteria. 
Ohjertive terms: [iicplano 



casual 

warm 

pasiona] 

aaa 

Snbjcrdve terms: , 

^ melancholic 

sad 

ISEBlSBi 

quiet 

pippo 



Ajii- 



Oracle Sdenlen 
Oitcria: 



Nation aliQr 

lob 

Hobby 



Search Fnginrs: 



Lycos Rich Media Gall 






Submit query 



Reset Form 



Fig. 3. Querying K-DIME for ’romantic’ images of Airplane using Oracles similar in 
gender. 
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the list of subjective keywords defined in existing oracles, (3) indicates the cri- 
teria on the basis of which Oracles will be selected and (4) selects the search 
engines to be queried with the above objective keywords. 

Figure 4 shows the first page of results for the query “romantic images of air- 
planes”. At the time of the snapshot, the query was still underway. 272 images 
of airplanes from Alta-Vista were analyzed, out of which, 24 were considered 
relevant for the impression word “romantic” by the Oracle named “Kenji” . 



Rnultsof K-DIME Query 

Tha query was (or lufflanuc ubs^ (dated (o aiiplanc The Kveh aijpiia queried. Aha VhU. Lycos RjehMedia Oallcry and 
Yahoo Imafir Surfer The oracio used arc romanlir orarlefor kenji 

Here are remiltt 1 (o 12of 34mairhe»out of 272 candidate* (0 fetched) . 






0S4641862 0MCM5947 vOJk99«&2S 0A03t%41 

^ 062^721 0&2079226 



0.7964S99I 079614S26 0.790792U 



07S39S1A V0779943II 



Fig. 4. Filtered results for query ’romantic airplane’ in fig 2. Not having found a 
definition for romantic for the user Paul, Image filter selected the Oracle of Kenji 
according to the gender criteria. 



4.3 MIKE: A Multimedia Interactive Environment for Kansei 
Communication 

MIKE is an interactive environment for users to interactively create and person- 
alize their Kansei user model. It is endowed with simple turn-taking capabilities 
(e.g. proposing a word or an image) so that both user and Oracle are truly ac- 
tive in the dialog. Both user and Oracle can choose visual examples (images) and 
show them to each other. Disagreement on the judgment of either of the agents 
leads to further interaction. The interactive sessions are supported by an “action- 
database” that keeps track of uers’ actions, i.e. (dis)agreeing with the system, 
proposing new examples for an impression word or vice-versa, shifting the focus 
of attention from an image region or features to another one, eliciting reasons 
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for (dis)similarities between images. In turn, the Oracle uses that information 
to highlight inconsistencies, ask for clarification, re-train the neural networks or 
modify the image-processing kernel related to the impression word. The latter 
point is crucial since different impressions can relate to different parameters. For 
example, “romantic” or “fresh” might be more related to color features while 
“imposing” or “brave” impressions might derive from shape characteristics. 



5 K-DIME’ Features 

No need of personal models for performing queries and 
Bootstrapping of the learning process. Using the profiles embedded into 
the Oracles, the system can answer queries of new users by automatically select- 
ing Oracles with a profile similar to the new user, provided that those Oracles 
contain a model of the subjective keywords queried. When an Oracle cannot 
handle a request, additional Oracles can be used to answer and produce an in- 
tegrated result. The similarity between two user profiles is computed according 
to criteria entered by the new user in the query form. Upon feedback of the 
user (e.g. discarding images that do not fit his/her understanding of the queried 
subjective keyword) (Figure 5), a personalized word-model can be created and 
added to the Oracle. 



Vww Cc CoiT«Tiirica»> 



Mete 








.i w J.' 



Fig. 5. Filtered results of the query “Fresh images of Maui” with user feedback (checked 
= discarded). 
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Dynamical introduction of new words and associated 
training sets. New subjective words can be added to an Oracle even if they are 
not defined by any of the Oracles. It is achieved either directly, in interactive 
sessions by creating an appropriate training set or indirectly, through the WEB 
browser by commenting on the results of a query. As shown by Figure 5, the user, 
when presented with the results of a query, can suggest new words to describe 
the retrieved images. 



[jalil 




trif ^ I — t i 




Fig. 6. The user Karin and the Oracle are discussing over the impression word “happy” 
The User has shown her agreement and disagreement with the images retrieved for 
“happy” . The Oracle highlights the discordances with previous evaluations and updates 
the training set. 



Automatic Oracle customisation. The learning process is done either auto- 
matically or interactively on the basis of user feedback for a word. Reclassified 
images or discarded images are added to the training set of the corresponding 
word with the value (positive or negative) given by the user. An analysis of the 
training set is performed in order to identify inconsistencies in the user classifi- 
cation, i.e. the same image or similar images classified with opposite evaluation. 
When no inconsistency is encountered, a new learning phase takes place and 
the Oracle is updated. In case of inconsistencies, an interactive customization is 
suggested to the user in order to improve the robustness of the learning phase. 




166 



N. Bianchi-Berthouze and T. Kato 



6 Interactive Oracle Customisation 

The interactive sessions (Figures 6, 7) provide the user with a dynamical rep- 
resentation of the Oracle activity so as to (a) provide him/her with a synthetic 
view of the model constructed and (b) facilitate his/her reading of the possible 
reasons for misunderstanding (inappropriate selection of images or features to 
be considered). Table 1 gives a short description of the possible actions available 
to Oracle and User during an interactive session. These patterns of interaction 
aim at: evaluating the learning process, modifying word module structures, spe- 
cialising the image processing for each word and specialising the lexicon. 




Fig. 7. An Oracle uses a similarity matrix to highlight inconsistencies in the training 
set. Cells in the central “square” of the matrix correspond to a normalized interdistance 
between positive examples (in row) and negative examples (column). Light (resp. dark) 
color denotes a high (resp. low) degree of similarity. Users can browse the training set 
by clicking on any matrix cells. Images appear in the right-hand side of the window 
while the corresponding signatures are displayed on top of the similarity matrix. 



Some examples of interactive patterns and their consequences are given in 
the dialogs below. 

Learning Process Evalnation. A statistical (learning error) evaluation of the 
learning process can be made by both actors by testing the retrieving or judging 
capabilities of the Oracle and through a qualitative and quantitative analysis of 
the training set. 
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Table 1. Oracles and Users actions during interactive sessions. 



THE ORACLE CAN: 

1. express the reason behind its evaluation (externalization) : 

- showing the training set given for a word by the user 

- showing similarities between features of training set and of selected images, 

2. ask for explanation when inconsistencies appear in the user evaluation: 

- in the training set: images with similar features showing opposite evaluation 

- a same image being judged differently over time 

3. propose the splitting of an impression word in two or more sub-impressions 
(create nuances): 

- when a clustering of the images appears 

- before adding to a training set an image very distant from the images in the 
training set 

- in the training set: images with similar features showing opposite evaluation 

4. identify the set of features to be used in the learning process: 

- in a same session, the user tends to use an impression word with a same 
meaning. The images used as examples allow the Oracle to identify and propose 
the set of features that define the impression word 

- when a new image is very distant from the training set. 

THE USER CAN: 

1. evaluate the state of the learning process 

- asking for images responding to an impression word and showing the agreement 
or disagreement with the result 

- observing qualitative and quantitative view of the training set 

2. emphasize the importance of features with respect to an impression word: 

- shifting the focus of attention to a different region of an image 

- give new examples that display the desired features 

3. modify the training set: 

- deleting from the training set negative examples that show high similarities with 
positive examples 

- having redundant elassification of the training set in order to reduce inconsis- 
teneies 

4. create relations among words: opposite, synonym, nuance relations. 



Word Module Structure Modification. Even though humans share a same 
visual apparatus, emotive responses over visual perceptions vary from an indi- 



ORACLE: Browses training set for “happy” and deteets inconsistencies. 

Images 256 and 330 in training set have opposite judgement but similar features 

USER: It is true, image 256 is not “happy”. You can delete it. 

ORACLE: Deletes image 256 from training set and re-learn. 
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vidual to the other and contextually because perceptions are affected by other 
cognitive processes. From our perspective, it implies that (a) any low-level fea- 
ture will not systematically be relevant to all impression words, (b) its relevance 
will vary from word to word and even within a word, it could vary over time and 
(c) the pre-processing of the visual features and their labelling will change from 
word to word. Hence, the structure of each network and its place in a module are 
variable. Naturally a modification of the network topology requires re-training 
of the network. 



USERdmage 05560 is “imposing” 

ORACLE: Browses the training set looking for “imposing” and find discrepan- 
cies in the related colour features. 

Image 05560 is similar to 730 (not “imposing”) according to colour features. 

USER: Colour features are less important than position and area. 

ORACLE: Checks networks topology and reinforces the weight for networks that 
have area and position in input and decrease the weight for colour networks. 



Lexicon Specialisation. The lexicon specialisation takes place when (a) a new 
impression word is used, (b) nuances of a same word are recognised or (c) rela- 
tions are established between words. 

7 System Evaluation 

We tested our system in the following scenario: users were asked to identify pic- 
tures they would like to be displayed on a greeting card for a friend according 
to an objective keyword and a subjective impression they would like to convey 
to their friend. First, users were asked to perform the task using their usual 
WEB-based engine. The same task was then repeated using K-DIME, with the 
possibility of personalizing the initial Oracle. 

Early results show the interest of our method in the following areas: (a) the 
effective ability of the system to model and personalize a user model and (b) the 
reduction of the search time necessary for a user to find a satisfactory image or 
a set of images. 

Table 2 shows a first evaluation of K-Dime by comparing its performance with 
other search engines on the WEB. The search engines mentionned in the 3rd col- 
umn have been queried using the objective keyword in the 2nd column. Querying 
those search engines with the impression-word yield too poor a set because the 
search is made on the HTML text surrounding the picture rather than on its 
visual content. In other words, unless the legend of the picture literally specifies 
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USER: Image 05010 is “dynamic” and “happy” 

ORACLE: It checks the lexicon and finds that it doesn’t know the word 
“dynamic”. Creates a module for “dynamic” and add 05010 at the training set. 

I don’t know the word “dynamic” . Can you show more examples? 



ORACLE: Checks the evaluation of image 05010 according to its internal state. 
The image 05010 is distant from the images in the training set for “happy” 
with respect to colour features. Browse the training set and tell me if you are 
using the word happy with a different nuance. Are Color features important for 
“happy”? 

USER: Image 05010 is “happy: freedom”, 05100 is “happy: cheerful”, .... 

ORACLE: Creates (or updates) “freedom” and “cheerful” modules. Creates 
nuance relations between “happy” and the two words. Splits the training set. 



USER: I want “happy” images. 

ORACLE: Browses the database looking for images that activate the learning 
module “happy” and satisfy the relation with the word “happy”. 

These are happy images: 05117, 05093, 05029, 05120, 05074, 05069, 05059, 05030, 
05115,05003. Do you agree? 

USER: I disagree, 05117, 05120, 05074, 05069, 05030, 05003 are not “happy”. 

ORACLE: Compares the user feedback with “happy” module state. 

I added images 05120, 05074, 05069 to the training set. But I found inconsisten- 
cies with previous evaluation: 05003 similar to 480, 577 (’’happy” images), 05030 
similar to 480 (’’happy” image), 05093 similar to 537 (’’happy” image). Why do 
you disagree for image 05003? 

USER: It’s too “dark”. 

ORACLE: Checks if the word “dark” exists and if there are any relation between 
“dark” and “happy” that are inconsistent with this feedback. Check if there is any 
“happy” image in the training set that is “dark”. If no problem it adds a negative 
relation from “happy” to “dark”. (Otherwise reports inconsistencies to the user). 
I added a negative relation from “happy” to “dark” 
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that an image is “romanti”, querying a search engine with “Hawaii” and “Ro- 
mantic” is not likely to yield an exploitable set of images. K-Dime, instead, is 
queried using both objective and subjective keywords (1st column). 

The 4th column indicates the number of images considered relevant by the user 
compared to the number of images browsed by the user after querying the search 
engine indicated in the 3rd column. The ratio of images browsed by the user ver- 
sus the total number of images fetched by the search engine is given by comparing 
the right number in columns 4 and 6. 

The 5th column provides the same ratio when K-Dime is queried. Again, the 
ratio of images browsed by the user versus the total number of images retrieved 
by K-Dime is given by comparing the right number in columns 5 and 6. 

The last column gives the ratio between the number of images K-Dime extracted 
from the output of the search-engine and the cardinal of this output. 



Table 2. Performances on some impression words and objective words (see details in 
text). 



Impression 

Word 


Objective 

Word 


Search 

Engine 


Search 

Engine 

Precision 


K-DIME 

Precision 


Image No by 
K-DIME/SEngine 


Fresh 


Hawaii 


Altavista 


8/144 


7/21 


148/11814 


Happy 


America 


Altavista 


6/78 


5/16 


150/24180 


Happy 


Carnival 


Altavista 


15/208 


15/31 


484/2753 


Sad 


Carnival 


Altavista 


15/80 


14/48 


200/2753 


Romantic 


Airplanes 


Altavista 


15/108 


13/24 


281/2900 


Romantic 


Maui 


Lycos 


6/264 


6/14 


300/2900 


Natural 


Holidays 


Lycos 


0/500 


4/12 


748/11814 



8 Conclusion 

New technologies should be capable of tailoring (or personalising) themselves 
to their users so a broader bandwidth of communication could be achieved. 
K-DIME uses dialogues in pseudo-natural language with the user to refine its 
understanding of his/her personality and achieve better performance in retriev- 
ing images on the basis of text and visual subjective impression. The retrieval 
on the basis of text is achieved by exploiting existing web-based search engines. 
The retrieved images are then integrated and filtered on the low-level features 
of the images to match the requested subjective visual impressions. Users are let 
develop their own Kansei model. With its ability to bootstrap a new user model 
from the model of a user with similar profile, our system significantly reduces the 
workload generally associated with an online learning phase. Continuous adap- 
tation driven by specific patterns of interaction with the user enables K-DIME 
to cope with the intrinsic variability of subjective impressions. We report a first 
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evaluation regarding the behavior of the system on a few subjective words. The 
framework proposed should be applicable to more modalities such as sound or 
video provided that the appropriate low-level processing capabilities are imple- 
mented. In the future, we would like to investigate the feasibility of creating 
Kansei group models using the bootstrapping and adaptation mechanisms pro- 
vided by the current prototype. For example, it would be useful for a company 
to select multimedia data when advertising a new product to a given population 
of user. 
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Abstract. Dynamic data is data which varies rapidly and unpredictably. 
This kind of data is generally used in on-line decision making and hence 
needs to be delivered to its users conforming to certain time or value 
based application-specific requirements. The main issue in the dissemi- 
nation of dynamic web data such as stock prices, sports scores or weather 
data is the maintenance of temporal coherency within the user specified 
bounds. Since most of the web servers adhere to the HTTP protocol, 
clients need to frequently pull the data depending on the changes in the 
data and user’s coherency requirements. In contrast, servers that possess 
push capability maintain state information pertaining to user’s require- 
ments and push only those changes that are of interest to a user. These 
two canonical techniques have complementary properties. In pure pull 
approach, the level of temporal coherency maintained is low while in pure 
push approach it is very high, but this is at the cost of high state space 
at the server which results in a less resilient and less scalable system. 
Communication overheads in pull-based schemes are high as compared 
to push-based schemes, since the number of messages exchanged in the 
pull approach are higher than in push based approach. Based on these 
observations, this paper explores different approaches to combining the 
two approaches so as to harness the benehts of both approaches. 



1 Dynamic Data Dissemination 

Dynamic data can be defined by the way the data changes. First of all it changes 
rapidly, changes can even be of the order of one change every few seconds; it also 
changes unpredictably, making it very hard to use simple prediction techniques 
or time-series analysis. Few examples of dynamic data are stock quotes, sports 
scores and traffic or weather data. Such of kind of data is generally used in 
decision making (for example, stock trading or weather forecasting) and hence 
the timeliness of delivery of this data to its users becomes very important. 

Recent studies have shown that an increasing fraction of the data on the 
world wide web is dynamic. Web proxy caches that are deployed to improve 

S. Bhalla (Ed.): DNIS 2000, LNCS 1966, pp. 173-187, 2000. 
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user response times must track such dynamic data so as to provide users with 
temporally coherent information. The coherency requirements on a dynamic data 
item depends on the nature of the item and user tolerances. To illustrate, a 
user may be willing to receive sports and news information that may be out- 
of-sync by a few minutes with respect to the server, but may desire stronger 
coherency requirements on data items such as stock prices. A proxy can exploit 
user-specified coherency requirements by fetching and disseminating only those 
changes that are of interest and ignoring intermediate changes. For instance, a 
user who is interested in changes of more than a dollar for a particular stock 
price need not be notified of smaller intermediate changes. The problem can be 
termed as the problem of maintaining desired temporal coherency between the 
source and the user, with the proxy substantially improving the access time, 
overheads and coherency. 

We study mechanisms to obtain timely updates from web sources, based on 
the dynamics of the data and the users’ need for temporal accuracy, by judi- 
ciously combining push and pull technologies and by using proxies to dissemi- 
nate data within acceptable tolerance. Specifically, the proxies (maintained by 
client organizations) ensure the temporal coherence of data, within the tolerance 
specified, by tracking the amount of change in the web sources. Based on the 
changes observed and the tolerance specified by the different clients interested 
in the data, the proxy determines the time for pulling from the server next, and 
pushes newly acquired data to the clients according to their temporal coherency 
requirements. 

Of course, if the web sources themselves were aware of the clients’ temporal 
coherency requirements and they were endowed with push capability, then we can 
avoid the need for mechanisms such as the ones proposed here. Unfortunately, 
this can lead to scalability problems and may also introduce the need to make 
changes to existing web servers (which do not have push capabilities) or to the 
HTTP protocol. 

In this paper we discuss in detail the advantages and disadvantages of using 
push and pull techniques and examine different approaches to combining both 
to obtain their advantages without suffering from their limitations. 



2 Maintaining Temporal Coherency 

Consider a proxy that caches several time-varying data items. To maintain co- 
herency of the cached data, each cached item must be periodically refreshed with 
the copy at the server. For highly dynamic data it may not be feasible to main- 
tain strong cache consistency. An attempt to maintain strong cache consistency 
will result in either heavy network overload or server load. We can exploit the 
fact that the user may not be interested in every change happening at the source 
to reduce network utilization as well as server overload. 

We assume that a user specifies a temporal coherency requirement c for 
each cached item of interest. The value of c denotes the maximum permissible 
deviation of the cached value from the value at the server and thus constitutes 




Dissemination of Dynamic Data on the Internet 



175 



the user-specified tolerance. Observe that c can be specified in units of time 
(e.g., the item should never be out-of-sync by more than 5 minutes) or value 
(e.g., the stock price should never be out-of-sync by more than a dollar). As 
shown in figure 1, the proxy sits between the user and the server, and handles 
all communication with the server based on the user constraint. Given the value 
of c, the proxy can use push- or pull-based techniques to ensure that that the 
temporal coherency requirement (ter) is satisfied. 




Fig. 1. Proxy-based Model 



The fidelity of the data seen by users depends on the degree to which their 
coherency needs are met. We define the fidelity / observed by a user to be the 
total length of time that the above inequality holds (normalized by the total 
length of the observations) . In addition to specifying the coherency requirement 
c, users can also specify their fidelity requirement / for each data item so that 
an algorithm that is capable of handling users’ fidelity requirements (as well as 
ters) can adapt to users’ fidelity needs. 

Traditionally the problem of maintaining cache consistency has been ad- 
dressed either by server- or client-driven approaches. In client-driven approach, 
cache manager contacts the source periodically to check validity of the cached 
data. We call this period Time- To -Refresh or TTR. Choosing very small TTR 
values help in keeping cache consistent although at the cost of bandwidth. On 
the other hand, very large TTR values may reduce network utilization but only 
at the cost of reduced fidelity. Polling- each-time and Adaptive TTR are examples 
of client-driven techniques. Clearly these techniques are based on the assumption 
that an optimum TTR value can be predicted using some statistical information. 
This may not be true for highly dynamic data which is changing unpredictably 
and independently. The other class of algorithms are server-driven wherein server 
takes the responsibility of either invalidating or updating the proxy cache. Send- 
ing invalidation messages or pushing recent changes are examples of such tech- 
niques. 

Also because of dynamics of the data, none of the above techniques can 
deliver high fidelity with optimum resource utilization. In the following sections 
we explain how one can use user specified constraints to offer high fidelity with 
efficient use of available resources. 
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3 The Pull Approach 

The Pull approach is the most traditional approach for maintaining the temporal 
coherency of caches caching dynamic data. In this model, each data item is 
assigned a certain TTR when the data object is brought in the cache. Since the 
arrival of the data item and until time equal to TTR elapses, all the requests 
for the data object are satisfied from the cache without looking up the values in 
the data sources. Thus in this approach, the proxy is responsible for obtaining 
the data from the server. The proxy issues a GET request to the server and the 
server just delivers the required data. 



3.1 Periodic Pull 




Fig. 2. Periodic Polling aka WebCasting 



Most of the current applications do WebCasting [16] i.e., periodic polling as 
shown in figure 3.1. The user registers with the proxy which does “webcasting” 
with a constraint, the proxy periodically polls the server for this data periodically 
and whenever a change of user interest has occurred, it pushes the change to the 
user. This approach is equivalent to setting the TTR value of a cached item 
statically. Thus, the proxies obtain data from data sources with such a high 
frequency that the user gets the feel that the data is being pushed by the server 
only. However, this can lead to a very high network overheads in case the polling 
period is too low or may cause the user to miss some changes of interest if the 
polling period is too high. Clearly, this technique is useful only if rate of change 
of data is constant or relatively low (such as news). If the rate of change itself 
is varying, then this technique of assigning frequency apriori is not suitable (as 
in the case of stock quotes). But still, currently this is the most popular data 
delivery technique as it can be purely web-based (because of HTTP) and does 
not need any special resources (like push capability in servers or modification of 
HTTP). 

3.2 Aperiodic Pull 

Since dynamic data changes independently and unpredictably, we cannot use 
standard prediction and forecasting algorithms for predicting the next TTR 
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IU(t)-S(t)l <= c 



c is user's coherency requirement(constraint) 



Fig. 3. Adaptive Polling 



value to be assigned to the data object. A new method for assigning TTR values 
is given in [15]. Given a user’s coherency requirement, this technique allows a 
proxy to adaptively vary the TTR value based on the rate of change of the data 
item. The TTR decreases dynamically when a data item starts changing rapidly 
and increases when a hot data item becomes cold. To achieve this objective, the 
Adaptive TTR approach takes into account (a) static bounds so that TTR values 
are not set too high or too low, (b) the most rapid changes that have occurred 
so far and (c) most recent changes to the polled data. 

In what follows, we use Dq, T>i, . . ., Di to denote the values of a data item 
D at the server in chronological order. Thus, Di is the most recent value data 
item D. 

The adaptive TTR is computed as: 

TTRadapUve = MaxiTTR^in, MiniTTR max 1 

a X TTRhr + (1 - a) X TTR^yn)) 

where 

— \TT RmimTT Rmax] denote the range within which TTR values are bound. 

— TTRhr denotes the most conservative, i.e., smallest, TTR value used so far. 
If the next TTR is set to TTRhr, temporal coherency will be maintained 
even if the maximum rate of change observed so far recurs. However, this 
TTR is pessimistic since it is based on worst case rate of change at the 
source. If this worst case rapid change occur for only a small duration of 
time, then this approach is likely to waste a lot of bandwidth especially if 
the user can handle some loss of fidelity. 

— TTRdyn is a learning based TTR estimate founded on the assumption that 
the dynamics of the last few (two, in the case of the formula below) recent 
changes are likely to be reflective of changes in the near future. 



TTRdyn = {w X TTRrsUmate) + ((1 ~ w) X TT Riatest) 



where 

• TT Restimate IS &n estimate of the TTR value, based on the most recent 
change to the data. 



TT ddrstimate — 



TT Riatest 



I T)latest ^penultimate \ 



X C 



If the recent rate of change persists, TTResUmate will ensure that changes 
which are greater than or equal to c are not missed. 
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• weight w (0.5 < w < 1, initially 0.5) is a measure of the relative change 
between the recent and the old changes, and is adjusted by the system 
so that we have the recency effect, i.e., more recent changes affect the 
new TTR more than the older changes. 

— 0<a<lisa parameter of the algorithm and can be adjusted dynamically 
depending on the fidelity desired, with a higher fidelity demanding a higher 
value of a. 

The adaptive TTR approach has been experimentally shown to have the best tc 
properties among several TTR assignment approaches [15]. 

4 The Push Approach 

In this method, the server is responsible for delivering the relevant data to the 
user. The server does not behave in request-response mode, where the server 
delivers some data only when there is an explicit request for it. But instead, 
server pushes the data into channel without any explicit request. As before, the 
server can push the data either periodically or aperiodically. 



4.1 Periodic Push 



SERVER 



Periodic 

Push 



c 

H 

A 

N 

N 

E 

L 



Pull 



PROXY 

(cache) 



Push 



USER 



Client System 



Fig. 4. Periodic Push (Channel acts like a data medium) 



In this method (figure 4.1), the server is not aware of the exact coherency 
needs of the a particular client, but only of the general demand for data items. 
So, based on the general demand distribution of data items the server creates 
a schedule for dissemination of data items. A data item with higher demand 
will be disseminated with higher frequency and vice-versa. All data items get 
divided into frequency bands, where data items belonging to one frequency band 
have similar demands. Once the push schedule is created using these frequency 
bands, it is not changed. The server then repeats this schedule periodically. 
The Broadcast Disks [1] approach (figure 4.1) is one such approach where the 
frequency bands are termed as broadcast disks. This approach also provides for 
client feedback. An interesting property of this approach is that it treats the 
channel like a medium and tries to decide on the “format” in which the channel 
should hold the data. In a way the channel itself is acting like a proxy. 
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Fig. 5. Broadcast Disks 



It is clear that since the server is not aware of the specific needs of the users, 
the push schedule may not be adequate for users who desire high fidelity and 
specific results. So, the approach may lead to low fidelity and/or wastage of 
bandwidth. But, the approach is useful for data which is generic and changes 
not very rapidly (e.g. news, digests, entertainment). 



4.2 Aperiodic Push 




IU(t)-S(t)l <= c 



c is user’s coherency requirement constraint), server knows the constraint 



Fig. 6. Aperiodic Push 



In the aperiodic push-based approach, the proxy registers with a server, iden- 
tifying the data of interest and the associated ter, i.e., the value c. Whenever the 
value of the data changes, the server uses the ter value c to determine if the new 
value should be pushed to the proxy; only those changes that are of interest to 
the user (based on the ter) are actually pushed (figure 4.2). Formally, if Dk was 
the last value that was pushed to the proxy, then the current value Di is pushed 
if and only if \Di — Dk\ > e, 0 < k < 1. To achieve this objective, the server 
needs to maintain state information consisting of a list of proxies interested in 
each data item, the ter of each proxy and the last update sent to each proxy. 

The key advantage of the this approach is that it can meet stringent co- 
herency requirements — since the server is aware of every change, it can precisely 
determine which changes to push and when. A limitation of push-based servers is 
that the amount of state that needs to be maintained can be large, especially for 
popular data items. A server can optimize the state space overhead by combining 
requests from all proxies with identical ters into a single request; all proxies are 
notified if the change to the data item D exceeds a specified ter. Even with such 
optimizations, the state space overhead can be excessive, which in turn limits 
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the scalability of the server. A further limitation of the approach is that it is not 
resilient to failures. The state information is lost if the server fails and requires 
the proxy to detect the failure and re-register its ter for the data item. 

5 Push vs. Pull 

Push and Pull approaches have complementary properties with respect to fi- 
delity, network utilization, scalability and resiliency. We have summarized the 
properties in table 1. 

5.1 Communication Overheads 

In push-based approach, the number of messages transferred over the network is 
equal to the number of times the data changes so that the user specified temporal 
coherency is maintained. A pull-based approach requires two messages — a HTTP 
request, followed by a response — per poll. Moreover, in the pull approach, a proxy 
polls the server based on its estimate of how frequently the data is changing. If the 
data actually changes at a slower rate, then the proxy might poll more frequently 
than necessary. Hence a pull-based approach is liable to impose a larger load 
on the network. However, a push-based approach may push to clients who are 
no longer interested in a piece of information, thereby incurring unnecessary 
message overheads. The communication overhead also depends upon dynamics 
of the data. For rapidly changing data, in order to maintain high fidelity cache 
manager must poll the source very frequently. As the rate with which data 
is changing also varies with time, many of these requests may prove useless 
incurring unnecessary network load. Similarly if the data is changing very slowly 
then again many polls may prove useless. 



Table 1. Properties of Push and Pull 



Algorithm 


Resiliency 


Temporal Coherency 


Overheads (Scalability) 


Communication 


Computation 


State Space 


Push 


Low 


High 


Low 


High 


High 


Pull 


High 


Low (for small constraints) 
High (for large constraints) 


High 


Low 


Low 



5.2 Computational Overheads 

Computational overheads for a pull-based server result from the need to deal 
with individual pull requests. After getting a pull request from the proxy, the 
server has to just look up the latest data value and respond. On the other hand, 
when the server has to push changes to the proxy, for each change that occurs. 
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the server has to check if the coherency requirement for any of the proxies has 
been violated. This computation is directly proportional to the rate of arrival 
of new data values and the number of unique temporal coherency requirements 
associated with that data value. Then the computational overhead per data item 
is of the order of rate of arrival of new values times the number of unique coher- 
ence requirements associated with that value. Although this is a time varying 
quantity in the sense that the rate of arrival of data values as well as number 
of connections change with time, it is easy to see that push is computationally 
more demanding than pull. 

For each pull request, server has to open a new connection with the client 
and close it after the request is served. Opening/closing of connections clearly 
imposes a resource overload. The above observation may not hold if large number 
of clients are interested in similar data items. In such cases, monitoring just 
one item can satisfy many client requirements i.e., the cost of monitoring that 
data item is amortized over a large number of clients. This cost is less than 
serving individual requests from each of the proxies. It makes sense to have push 
connections in such situations. In short, high computational load may arise either 
because of too much polling of the server or too much monitoring load. 

5.3 Space Overheads 

A pull-based server is stateless. In contrast, any push-based server must maintain 
the c value for each client, the latest pushed value, along with the state associated 
with an open connection. Since this state is maintained throughout the duration 
of client connectivity, the number of clients which the server can handle may 
be limited when the state space overhead becomes large (resulting in scalability 
problems) . 

5.4 Resiliency 

By virtue of being stateless, a pull-based server is resilient to failures. In contrast, 
a push server maintains crucial state information about the needs of its clients; 
this state is lost when the server fails. Consequently, the client’s coherency re- 
quirements will not be met until the proxy detects the failure and re-registers 
the coherency requirements with the server. 

In push-based techniques, we can classify failures as server side, client side or 
communication failure. Each of these has different implications on the behavior 
of the system. 

— In case of server failures, state at the server is lost. Most of the push algo- 
rithms require state to be maintained at the server and hence their correct- 
ness may get compromised in such cases. Cache coherency is not guaranteed 
until the state is reconstructed at the server. It may not be possible for a 
server to initiate error recovery because it has no way to know which clients 
were being served when the crash occurred. Client should somehow know 
about server failure so that it can start error recovery. 
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— Clients may also fail. A server has to allocate resources to each client. As 
resources are valuable, in case of unreachable clients these resources must 
be reclaimed. Push-based techniques rely on some kind of feedback from 
the clients to handle clients failures. Obviously this may generate additional 
control messages adding to total communication overhead. 

— Communication failures occur either due to socket failures at any one of the 
ends, network congestion or network partition. Push-based techniques must 
employ special mechanisms to deal with such errors. 



5.5 Scalability 

Pull servers are generally stateless and hence scalable. A Server has to respond 
to the incoming request, but need not maintain any state information or keep 
connection open with the client after the request is satisfied. Since open con- 
nections consume sockets and buffer space, it is necessary to close the unwanted 
connections. With an upper bound on the number of sockets and the state space 
available, this property is very desirable and often helps in making servers scal- 
able. 

Web servers deployed all over the world are pull-based and stateless. A user 
sends a request and waits for the response. The primary consideration has been 
to make the web servers scalable. It is true for normal applications, but for the 
data which is changing rapidly and that too with different rates, this may not 
be very true. There is certain overhead associated with opening and closing of 
connections. So the sockets once used may remain unavailable for some time 
period. When data at a source is changing very fast, the proxy will generate 
a large number of requests to keep its cache in sync with the source. Thus 
there will be a large overhead in opening and closing the connections. Also the 
computational load at the server becomes high because it has to respond to far 
more requests. The socket queues start filling up, increasing the response time 
and eventually a server may start dropping requests. 

Push servers have complementary characteristics. The server has to keep 
sockets open and allocate enough buffers to handle each client. With large num- 
ber of clients, state space and network resources can soon become bottlenecks 
and server may start dropping requests. In short, the scalability issue may arise 
because of the excessive server computation and network traffic or state space 
maintained at the server and resources allocated (such as sockets) and there is 
a clear tradeoff between these two constraints. 

6 Need to Combine Push and Pull 

From the above section it is clear that: 

— A pull-based approach does not offer high fidelity when the data changes 
rapidly or when the coherency requirements are stringent (i.e., small values 
of c). Moreover, the pull-based approach imposes a large communication 
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overhead (in terms of the number of messages exchanged) when the number 
of clients is large. But it may suffice for requests which have large values of 

c. 

— A push-based algorithm can offer high fidelity for rapidly changing data 
and/or stringent coherency requirements. However, it incurs a significant 
computational and state-space overhead resulting from a large number of 
open push connections and their serving processes/threads. Moreover, the 
approach is less resilient to failures due to its stateful nature. 

These properties indicate that a push-based approach is suitable when a 
client expects its coherency requirements to be satisfied with high fidelity, or 
when the communication overheads are a bottleneck. A pull-based approach is 
better suited to less frequently changing data or for less stringent coherency 
requirements, and when resilience to failures is important. The complementary 
properties of the two approaches indicate the need for having an approach which 
combines the advantages of both while not suffering from any of their disadvan- 
tages. 

7 Combinations of Push and Pull 

As is clear from the previous discussion, neither push nor pull alone is sufficient 
for efficient dissemination of dynamic data. These two techniques have compli- 
mentary properties with respect to fidelity offered, network utilization, server 
scalability and resiliency. Few attempts have been made in the past to combine 
these two canonical techniques. Adaptive leases [8] and Volume feases[13] are 
two examples. The former is used for maintaining strong cache consistency in 
the World Wide Web while the later is used for caches holding a large number 
of data items. None of these is meant for highly dynamic data. Nor do they take 
user requirements into account. In this section we describe the Adaptive Leases 
approach. We also describe two algorithms that we have developed for better 
scalability and coherency for dynamic data. 

7.1 Leases 

Leases are like contracts given to a lease holder over some property [4]. Whenever 
some client requests server for a certain document, server returns that document 
along with a lease. In other words, a server takes the responsibility of informing 
the client about any changes during the lease period. Once a lease expires, a 
client must contact the server and renew the lease. Client can use the cached 
copy while it has a valid lease over the data item. During valid lease period, 
client remains in push mode and is switched back to pull mode after the lease 
expires. Thus the client is alternatively served in push and pull modes. 

Clearly, pure leases are not very useful for dynamic data. It is very important 
to choose a good lease period. For a very high value, client remains in push mode 
for most of the time and scalability problem may arise. On the other hand, for 
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small values the lease renewal cost may be prove very high. Adaptive Leases 
try to dynamically adjust the lease duration. The decision is based on many 
parameters like popularity of the data item, server state space available and 
network bandwidth available. 

7.2 Dynamically Combining Push and Pull: PaP 

In the PaP [7] approach, the proxy operates in pull mode using some TTR 
algorithm, while the server is in push mode and knows the constraint. Using this 
constraint and proxy access patterns the server tries to predict when a client 
is going to poll next. If it determines that within this predicted time the client 
is likely to miss a change of interest, it pushes that change to the client. For 
predicting the client connection times, the server may run the TTR algorithm 
in parallel with client or use some simpler approximation of it. Given network 
delays, a server waits for the client to pull within a small window around the 
predicted time. So, once the first few changes are intimidated to the client (by 
pull or push) , the rest of the successive changes will be known to the client easily. 

In the ideal case, the fidelity offered will be 100%, but due to synchronization 
problems and other factors, it will be slightly less. But, it will always be much 
greater than pull. Because of the pull component the resiliency of the system will 
be high. And due to the push component, communication overheads will also be 
low. PaP also provides for fine tuning of its behavior. It has a few parameters 
which swing it towards more push or more pull, and thus its performance in 
terms of fidelity and resiliency can be controlled. 

7.3 Dynamically Choosing Push or Pull: PoP 

Another possibility is to divide incoming clients at the server into either push 
or pull clients and dynamically switch them to one or the other mode [7]. If 
resources are plentiful, every client is given a push connection irrespective of its 
fidelity requirements. This ensures that the best fidelity is offered. As more and 
more clients start requesting the service, resource contention may arise at the 
server leading to scalability problems. Few clients are then shifted to pull mode. 
Thus valuable resources are freed and system scales properly. Contrary to this, 
when resources again become available few high priority clients are switched 
back to push mode thus ensuring high fidelity. 

The most important issue is how to assign priorities to different clients. Few 
of the possible parameters are the access frequency of each client, temporal 
coherency requirement, fidelity requirement and network bandwidth available. 
Clearly no single criterion suffices but collectively they have the potential to 
offer high average fidelity still keeping the system scalable. 

8 Related Work 

[2] is one of the earliest papers relating to the topic of maintaining coherency 
between a data source and cached copies of the data. This paper discusses tech- 
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niques whereby data sources can propagate, i.e, push, updates to clients based 
on their coherency requirements. This paper also discusses techniques whereby 
cached objects can be associated with expiration times so that clients themselves 
can invalidate their cached copies. 

More recently, various coherency schemes have been proposed and investi- 
gated for caches on the World Wide Web where the sources are typically pull- 
based and stateless. Thus, the source is unaware of users’ coherency requirements 
and users pull the required data from the sources. 

A Weak consistency mechanism. Client polling, is discussed in [3], where 
clients periodically poll the server to check if the cached objects have been 
modified. In the Alex protocol presented here, the client adopts an adaptive 
Time-To-Live(TTL) expiration time which is expressed as a percentage of the 
object’s age. Simulation studies reported in [9] indicate that a weak-consistency 
approach like the Alex protocol ([3]) would be the best for web caching. The 
main metric used here is network traffic. While the Alex protocol uses only the 
time for which the source data remained unchanged, given our desire to keep 
temporal consistency within specified limits, we need to also worry about the 
magnitude of the change. 

A strong consistency mechanism. Server invalidation, is discussed in [11], 
where the server sends invalidation messages to all clients when an object is 
modified. This paper compares the performance of three cache consistency ap- 
proaches, and concludes that the invalidation approach performs the best. 

A survey of various techniques used by web caches for maintaining coherence, 
including the popular “expiration mechanism”, is found in [6]. It also discusses 
several extensions to this mechanism, but, as discussed in [15], these do not meet 
our needs. 

Another approach is for the cache server to piggyback a list of cached objects 
[12] whenever it communicates with a server. The list of objects piggybacked are 
those for which the expiration time is unknown or the heuristically-determined 
TTL has expired. 



9 Concluding Remarks 

Since the frequency of changes of time-varying web data can itself vary over 
time (as hot objects become cold and vice versa), in this paper, we argued 
that it is a priori difficult to determine whether a push- or pull-based approach 
should be employed for a particular data item. Also, complicating the choice 
is the complementary properties relating to their resiliency as well as state- 
space and communication overheads. To address this limitation, we proposed 
two techniques that combine push- and pull-based approaches and adaptively 
determine which approach is best suited at a particular instant. While we only 
focused on the server-proxy data path, similar combinations of push and pull 
can also be adapted for the proxy-client data path. We are currently evaluating 
the performance, functionality, and overhead profiles of these new algorithms so 
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as to determine the range of their applicability for disseminating dynamic Web 
data. 
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Abstract. In bnsiness system such as intranets, it is important to find 
valuable information from huge data in short term. A traditional central- 
ized search engine is not suitable for this purpose because it is difficult to 
update the index database of collected documents quickly. So, we pro- 
posed Cooperative Search Engine(CSE) in which more than one local 
meta search engines cooperate. Even if a search engine does not have 
index information, it can get the information by asking other search en- 
gines which have the information. Furthermore, in CSE, each local meta 
search engine makes own index database individually without HTTP 
transfer. As this result, CSE reduces updating intervals less than one 
hour in the network which consists of 10 hosts with 10^ documents. In 
this paper, we describe evaluations at both searching and updating in 
CSE. 

1 Introduction 

Recently, the need of information retrieval is especially increasing in corpora- 
tions, universities and so on. In such organizations, it is very important to use 
fresh information. However, conventional search engines, which we usually use, 
spend very long term, e.g. one month, to update indexes of all documents be- 
cause these search engines are centralized systems. In centralized search engines, 
a WWW robot collects documents from worldwide servers, and an indexer gener- 
ates their indexes. Because of this, it is difficult for centralized search engine to 
satisfy the requirement of such organizations. 

Cooperative Search Engine (CSE) was proposed to solve some problems of 
centralized search engines. It is a distributed search engine in which more than 
one Local Meta Search Engines (LMSE) search by cooperating with each other. 
In a typical centralized search engine, a search engine machine makes an index 
database of all documents in a particular domain, and owns it. In such a system, 
the size of collected documents and its indexes is too large for a machine to store 
the database. CSE is free from these problems because, in CSE, each LMSE 
independently makes an indexes of its own documents. 

S. Bhalla (Ed.): DNIS 2000, LNCS 1966, pp. 188-199, 2000. 
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Generally, distributed search engines are suited for intranet because they 
have the following advantages against centralized search engines: updating in a 
short term, unnecessary of mass storage for indexes. However, distributed search 
engines also have the disadvantage that time is spent at searching because of 
communication delay. Our CSE solves this problem by optimizing a set of sites 
which a query is sent to. In this paper, we describe that CSE can make index 
updating intervals shorter than centralized search engines and that CSE has 
almost the same performance as centralized search engines concerning searching. 

The organization of this paper is as follows: In section 2, we describe related 
works about distributed search engines. In section 3, we provide the overview 
of CSE and its behavior. In section 4, we explain scoring method of CSE which 
affects quality of search result. In section 5, we evaluate the performance of CSE. 
Finally, we summarize future works and conclusions. 



2 Related Works 

There are many attempts to make information retrieval faster by using parallel 
distributed processing. An information retrieval system consists of three parts: 
one part is called robot or gatherer that collects documents, another part is 
called indexer that makes indexes of collected documents, and a third part is 
called search engine that searches the indexes to find particular documents. 
PRSM[3], Harvest[6], WebAnts[7] and etc. employ distributed robot that collects 
documents in parallel at distributed locations. The combination of distributed 
robots and a centralized search engine is suitable for a large domain like the whole 
Internet. But it is not suitable for intranets because update intervals must be 
very short in such domains. 

FreshEye[10] realizes to reduce the update interval to one day by devising 
how to collect documents. FreshEye’s gatherer collects only frequently updated 
documents in a part of Internet domains. FreshEye cannot collect all documents 
of all domains in one day. In Infoseek[II], each web site needs to prepare the list 
of documents modified in one day, which is called robotl.txt. But these cannot 
make the update interval as short as business users expect it to be. 

In the other hand, several systems employ parallel or distributed search en- 
gines. Inktomi[12] searches using workstation cluster. In Ingrid[4], Ingrid servers 
construct Ingrid topology by linking related resources each other. At searching, 
each Ingrid client searches this topology network to find documents. However, 
Ingrid needs many servers, and communication delay is not predictable. 

WHERE [5] is an extension of Whois-|--|-, which is a distributed search engine 
based on the tf ■ idf scoring method. Where server has forwarding knowledge, 
but communication delay is not predictable because its routing structure is flat. 
In CSE, there is a meta server that guides clients. 
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3 Overview of CSE 

3.1 Structure of CSE 

CSE has three parts: one is Local Search Engine (LSE) that searches only local 
documents, another is Local Meta Search Engine(LMSE) that searches docu- 
ments using a LSE and other LMSEs, and third part is Location Server(LS) 
that knows what document each LMSE has. 

In CSE, these parts above cooperate with each other. As the result, it is 
possible to search distributed documents. Only one LS exists in the whole CSE. 
There may be any number of LMSEs in CSE. Although there is usually a LMSE 
in a web server, it is possible that there are any LMSEs in a web server. In 
addition, a LMSE can have the documents that a robot collects from several 
web servers. A LMSE uses a LSE in order to access local documents. A LSE is 
a lightweight search engine, e.g. Namazu, SGSE and so on, which are popular 
Japanese fulltext search engines for personal users. A LMSE searches documents 
by using a LSE and some LMSE reported by the LS. Such a behavior model of 
CSE has the following advantages: 

— In centralized search engines, it is difficult to reduce index updating interval. 
So they are not suitable for intranets which require very short intervals. 
But CSE could realize to update in very short interval because CSE makes 
indexes locally and concurrently. 

— Some traditional distributed search engines require specific search clients for 
a particular search engine, and they have high initial cost to introduce them. 
In CSE, specific client is not needed. 



3.2 Distributed tf • idf Method in CSE 

The tf ■ idf method is a score calculating method which does not consider only 
the frequency of the keyword but also the rareness of the keyword. It is available 
at boolean search. The tf simply means the frequency of the keyword which 
appears in a document. The idf means degree of the rareness of the keyword 
in all documents. In the tf ■ idf method, the score is defined as the following 
expressions: 



score = tf ■ idf (1) 

idf = log— (2) 

nk 

Where N is the number of documents which are indexed in particular search 
engine, Uk is the number of documents hit with keyword k. In distributed tf ■ idf 
method of CSE, idfdist is used instead of idf. The idfdist is defined as the the 
following expression: 



idfdist — 



log 



^ total 



(3) 
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Here, Ntotai is the number of documents in whole of CSE (= ^ ■ Ni), Uk,i is the 
number of hit documents in each LMSE. 

Although tf can be locally determined by a LSE, Ntotai and every are 
needed to calculate idf. A LMSE receives their values from LS at searching. In 
this way, each LMSE calculates the score independently. Finally, a LMSE shows 
a user in order of score. We describe the details in section 4. 

3.3 The Behavior of CSE at Searching 

The behavior of CSE at searching is shown as Figure 1. A user inputs a query 
to a web form, then a LMSE receives its query. 
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Fig. 1. The Behavior of CSE at Searching 



At that time, the behavior of CSE at searching is as follows. Here, S' is a set 
of LMSE, Si G S, So is a LMSE which receives a query from a user: 

1. So receives a query expression E from a user. 

2. So optimizes E internally if needed, and so sends E to LS. 

3. LS searches location database, and sorts Si in descending order of rangej(E) = 
[tfmax,tfrmn] (described in section 4.2). Then, Ntotai, {{k,idfk) : k G K}, 
{{si,tfmax,tfmin) '■ Si G S', Si knows fc} are returned to sq. Here, AT is a set 
of keywords appeared in E, tfmax is maximum value of tf in Sj, tfmin is 
minimum value of tf in Si. 

So makes a set of search results U empty, and initializes tfmax and tfmix- 

4. So sends search request to Si with E, (k,idfk), and Ntotai- Here, The steps 4 
to 6 are executed in parallel. 

5. Si searches a URL set of documents U which is matched to E, and sorts 
each element Uj G Ui in descending order of t/. If A is a query expression 
of boolean search, Si calculates the score based on tf ■ idf and Si sorts in 
descending order of score. 
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6. Si sends back the result of 5. 

7. So nierges the results, sq shows it in a user’s browser. 



3.4 Narrowing Search Targets in LS 

If the LS does not narrow search targets, i.e. a set of LMSEs, the communication 
cost would be very expensive at searching. So narrowing is required. 

For the purpose of narrowing, LS has both maximum score and minimum 
score of all documents in each LMSE. A narrowing procedure is as follows: 

1. LS searches a LMSE which has the highest tfmax- This LMSE is named 
LMSEi. 

2. LS lists LMSEi where LMSEiS tfmax > LMSEfs tfmin- 

3. LS regards LMSEi and LMSEs listed above as search targets. These list of 
LMSEs is sorted in descending order of tfmax- 

Furthermore, when a query expression E includes boolean operators, search 
target LMSEs are determined according to Table 1. 



Table 1. Determination of LMSE in boolean search 



Boolean expression 


Determined LMSE 


A and B 


SaHSe 


Aor B 


Sa^JSb 


A not B 


Sa 



Sa, Sb is each set of sites which have documents including A, B. 



3.5 The Behavior of CSE at Updating 

In CSE, the LS has location information that means which LMSE has the doc- 
ument including a certain keyword. Therefore, location information must be 
updated periodically. From the viewpoint of the whole CSE, it is convenient to 
update both indexes of documents and location information at the same time. 
Accordingly, a mechanism to update both each LMSE’s index databases and 
location information at the same time is available. 

Figure 2 shows an outline of behavior of CSE at updating. LS takes the 
initiative to update location information, because it is suited to control whole 
of CSE. When a newer LMSE joins in CSE, the LMSE have to notify LS that it 
joins. So, a LMSE should also take the initiative to update location information 
The outline of CSE’s behavior at updating is shown as follows: 

1. LS sends the request to LMSEi to update indexes and LS returns both 
keywords and scores. 
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Fig. 2. The Behavior of CSE at Updating 



2. LMSEi updates own index database, and extracts k, tfmax, tfmin, and the 
number of indexed documents Ni. 

3. LMSEi sends k, tfmax, tfmin back to Ni to LS when index updating is 
finished. 

4. LS registers k, tfmax, tfmin and iV*. 

4 Scoring in CSE 

4.1 The Differences of Namazu and SGSE 

Currently, both Namazu and SGSE are used as LSE in CSE. But, as shown in 
Figure 3, SGSE’s scores become 0.5 time to 1.5 time of Namazu’s. Basically, we 
should employ the same scoring method in every LSEs. However, it is difficult 
because SGSE’s scoring method is different from Namazu’s one. So, we employ a 
method that adjusts SGSE’s score to Namazu’s score. Namazu is more popular 
than SGSE and SGSE does not support tf ■ idf scoring. This is why we employ 
Namazu’s scoring. 



4.2 Scoring at Boolean Search 

In search engines such as Namazu and SGSE, the scores of boolean search are 
calculated as combinations of simple scores of keywords. For example, Namazu 
calculates score w{d, E) of search expression E for document d as follows: 



w{d, k) = tf{d, k) X idf{k) (4) 

ru((i, Hand B) = min(rt;((i, A),w{d, B)) (5) 

w{d,AorB)=ma,x{w{d,A),w{d,B)) (6) 

w{d,AnotB) = w{w,A) (7) 



Here, tf{d, k) means frequency of keyword k in document d. idf{k) means idf 
value of k, A and B means boolean expression. 
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Fig. 3. The score ratio of Namazu and SGSE 

We prepared the pairs of keyword and documents whose score is the same in 
Namazu. Then, we averaged the ratio of SGSE score and Namazu score. The 
documents are selected under http://www.toyonet.toyo.ac.jp/. 



SGSE also calculates score as following expressions: 



II 


(8) 


w{d, A and B) = w{d, A) x w{d, B) 


(9) 


w{d, Aor B) = w{d, A) + w{d, B) 


(10) 


w{d, A not B) = w{w,A) 


(11) 



SGSE’s quality of search result is lower than Namazu because SGSE does not use 
tf-idf method. In SGSE, w{d, A and B) is likely to be too high. In GSE, we employ 
Namazu’s scoring method at boolean search. Now, we introduce range(E), which 
is the range of tf with a query expression E at boolean search. Here, range(E) 
is defined as the following equations with keyword k, boolean expression A, B: 



range(fc) = [tfmix{k)Afmax{k)\ (12) 

range(HandS) = [min(range(H).t/™„,range(B).t/™„), 

min{ra,nge{A).tfrnax,range{B).tfrnax)] (13) 

range(HorS) = [max(range(H).t/™„, range(H).t/,„i„), 

ma,x{Ta,nge{A) .tfmax,range{B) .tfmax)] (14) 

range(Hnot H) = range(H) (15) 



Here, GSE’s calculation method of w{d,E) is the same as Namazu, tfmin{k) is 
minimum tf value in all documents of a certain site, and tfmax{k) is maximum 
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value. Furthermore, ra,nge{E) .tfmin and ra,nge{E) .tfmax is minimum and maxi- 
mum value respectively. As described in section 3.4, range(A) is used to narrow 
search target LMSEs. 



5 Performance Evaluation of CSE 

5.1 An Evaluation Item in the Index Update 

In CSE, there are the following processes at updating. 

Collecting documents This is a process that collects documents. Generally, 
centralized search engines require longer time than distributed search en- 
gines. 

Making index This is a process that makes indexes. It can be processed in 
parallel. 

Transferring index This is a process that transfers the indexes to LS. This is 
needed in CSE, but this is not needed in centralized search engines. 

There are the following kinds of updating methods in index updating. 

Simple updating Only indexes of both new files and changed files are updated, 
but indexes of deleted files may not be deleted. 

Complete updating All indexes are created again in complete. Indexes of 
deleted files are also deleted. 

To evaluate a process of index updating on the basis of these updating 

method, following items are measured. 

— Document Complete Collecting Time (T^g) is the time spent to collect doc- 
uments in complete updating. 

— Document Simple Collecting Time (T„g) is the time spent to collect docu- 
ments in simple updating. 

— Complete Index Making Time (Tcm) is the time spent to make indexes in 
case of complete updating. 

— Simple Making Index Time (Turn) is the time spent to make indexes in case 
of simple updating. The time needed collecting documents isn’t contained in 
Tcm ^iud Turn • 

— Complete Index Transferring Time (Tcs) is the time spent to transfer all of 
indexes. 

— Simple Index Transfer Time (T„g) is the time spent to transfer differences 
of indexes. 

Therefore, complete update time Tc and simple update time as follows: 

Tc = Teg + Tcm + Tcs 
Tu = Tug + Turn + Tus 



There are the following methods as how to access documents at collecting 
them. 
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Direct access is a method the indexer accesses documents through the file 
system, e.g. NFS. This is fastest method. 

HTTP access is a method where the indexer accesses documents by HTTP. 
In HTTP access, a prefetcher like GNU’s wget actually collects documents 
instead of the indexer. This method spends the longest time because com- 
munication delay occurs in every file. 

Archived access is a method where the indexer collects documents at one time 
by a CGI that archives documents. An archive is created using CGI installed 
in remote web servers. This method needs to make and extract archives, but 
communication delay per file is shorter than HTTP access. 

In a large majority of LMSEs, indexes could be made by using direct access 

method. 



5.2 The Necessary Time to Make Indexes 

In order to accelerate making indexes by concurrently, we developed message 
passing library using NFS file sharing, and construct a workstation cluster. Fol- 
lowings are characteristics of this workstation cluster. 

— For the purpose of load balancing, a master process controls some slave 
processes. 

— It uses NFS file sharing to synchronize and to communicate. 

We use 10 workstations (SuperSPARG II 75MHz, Memory 32MB) for the eval- 
uations described below. All files including both HTML documents and their 
indexes are stored in a NFS server. 

We have measured the necessary time for updating indexes with worksta- 
tion cluster. We used documents collected from ToyoNet (http://www.toyonet. 
toyo.ac.jp/). Figure 4 shows the necessary time to collect documents. Table 2 
shows the necessary time to make indexes. Table 2 also shows the time labeled 
“mknmz”, which is the time spent by Namazu’s indexer, mknmz. In this case, 
mknmz makes indexes after a prefetcher collects documents. These processes are 
executed sequentially. 



Table 2. The evaluation of index updating at ToyoNet 



Users 


Files 


mknmz 


Complete update time[h:m:s] 


Simple update time[h:m:s] 






[h:m:s] 


Make (Tcm) 


Transfer (Tcs) 


Make(Tu^) 


Transfer(Tus) 


50 


1283 


1:37:03 


13:55 


8:50 


1:04 


0:33 


100 


2355 


4:18:36 


18:41 


12:26 


1:33 


0:35 


200 


3915 


6:32:49 


43:17 


15:14 


2:28 


0:53 


300 


7427 


18:05:23 


1:06:54 


23:01 


4:19 


1:07 


400 


10238 


29:57:05 


1:35:25 


27:50 


5:22 


1:18 
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Fig. 4. The evaluation of getting documents from ToyoNet 



As shown Figure 4, we got 7 to 18 times speed as compared with sequentially 
index making. We got over 10 times speed than mknmz. This is because mknmz 
copies the large index file at every run time. 



5.3 Execution Time to Search 

When the scale of CSE grows, the number of LMSEs increases. We investigate 
the effect of LMSE’s increase. Table 3 shows average search times. First, we 
find 100 words from many words, whose tf value is greater than 10, at random. 
Then, we select a word from the 100 words in order to evaluate 1 word search. 
The values in Table 3 are average of 100 times searches. Here, the number of 
LMSE is variated. The average number of search results of these 100 words is 
26.8 items. In case of Namazu is used, the average search time is 0.08 seconds 
for same documents. 



Table 3. The evaluation of searching 



Number 
of LMSE 


1 word 


AND of 
2 words 


OR of 
2 words 


1 


1.04 


1.00 


1.24 


2 


1.32 


1.21 


1.93 


5 


1.73 


1.51 


2.65 


10 


2.59 


2.24 


4.36 


20 


3.56 


3.23 


6.01 



Unit is [sec] 
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Shown in Table 3, if CSE becomes a large scale and the number of LMSE 
increases, the necessary time to search does not increase so much. Because of two 
pass search and communication delay, CSE is slower than Namazu at searching. 
But it is contained in practice range. The search time of 2 words AND search is 
shorter than that of 1 word search because search target LMSEs are minimized 
not to send unneeded request. The search time of 2 words OR search is longer 
than others because search targets are not minimized. 

6 Conclusions 

In this paper, we proposed CSE as a search engine suited for intranet. CSE 
reduces the update interval between 1/15 and 1/5 compared with centralized 
search engines. In our experiment, CSE could update indexes within one minutes 
at the shortest. Because of both narrowing and parallel processing, CSE’s delay 
at searching nearly equal to centralized search engines. Scoring is one of problems 
in distributed information retrieval systems. In CSE, tf ■ idf based scoring is 
correctly realized in distributed fashion. 

Therefore we believe CSE is better than traditional centralized search en- 
gines. 

Our future works are as follows: 

— We have an idea to introduce cache server to prevent traffic concentration 
of LS. 

— In order to apply CSE to the Internet, we need hierarchal LS, which is 
allocated in each domain and behaves as a cache server too. 

References 

1. Hajime Baba, “List of the full-test retrieval software which can handle Japanese 
properly” http: //www. kuastro.kyoto-u.ac.jp/ 'baba/wais/othersystem.html 

2. Hayato Yamana, “Trends of WWW Search Engines” 

http: / /www.etl.go.jp/'yamana/Research/WWW/survey.html 

3. Japan Electronic Industry Development Association, “Report for next genera- 
tion distributed information retrieval system” http://www.jeida.or.jp/committee/ 
jisedai/top.html 

4. Nippon Telegraph and Telephone Corporation, Socio Network Computing Research 
Project “Global Information Grid” http://www.ingrid.org/ 

5. Miguel Rio, Joaquim Macedo, Vasco Freitas “A Distributed Weighted Centroid- 
based Indexing System” 

6. C. Mic Bowman, Peter B. Danzig, Darren R, Hardy, Udi Manber, Michael F. 
Schwartz, “The Harvest Information Discovery and Access System”, Computer 
Networks and ISDN Systems, Vol.28, pp. 119-125(1995) 

7. “WebAnts” http://polarbear.eng.pgh.lycos.com/webants/ 

8. Satoru Takabayashi “Namazu a full text retrieval search system” 
http: / /www. namazu.org/ 

9. “Sony Drive Search Engine” http://www.sony.co.jp/sd/Search/SGSE-DL.html 

10. “FreshEye” http://www.fresheye.com/ 




Information Retrieval Method for Updated Information System 



199 



11. “Infoseek” http://www.infoseek.com/ 

12. “inktomi” http://www.inktomi.com/ 

13. Takashi Yamamoto, Nobuyoshi Sato, Yoshihiro Nishida, Minoru Uehara, Hideki 
Mori, “Cooperative Search Engine”, DIC0M099, pl69-174(1999) 

14. Yoshihiro Nishida, Takashi Yamamoto, Nobuyoshi Sato, Minoru Uehara, Hideki 
Mori, “Cooperative Search Method in Distributed Search Engine” , SwoPP99, p87- 
92(1999) 




Blocking Reduction in Two-Phase Commit 
Protocol with Multiple Backup Sites 



P. Krishna Reddy and Masaru Kitsuregawa 

Institute of Industrial Science, The University of Tokyo 
7-22-1, Roppongi, Minato-ku, Tokyo 106-8558, Japan 
{reddy , kitsure}@tkl . iis .u-tokyo .ac.jp 



Abstract. The two-phase commit (2PC) protocol (or its variation) is 
widely employed for commit processing in distributed data base systems 
(DDBSs). The blocking phenomena in 2PC reduces the availability of 
the system as the blocked transactions keep all the resources until they 
receive the hnal command from the coordinator after its recovery. To 
remove the blocking problem in 2PC, three phase commit (3PC) protocol 
was proposed. Although 3PC protocol eliminates the blocking problem, 
it involves an extra round of message transmission, which degrades the 
performance in DDES (Internet environments). 

To reduce blocking, we propose a backup commit (BC) protocol by at- 
taching multiple backup sites to the coordinator site. In this protocol, 
after receiving responses from all participants in the first phase, the coor- 
dinator communicates the final decision to the backup sites in the backup 
phase. Afterwards, it sends the final decision to the participants. When 
blocking occurs due to the failure of the coordinator site, the participant 
sites can terminate the transaction by consulting a backup site of the co- 
ordinator. In this way, the BC protocol achieves non-blocking property 
in most of the coordinator site failures. 

The BC protocol suits best for World Wide Web (or Internet) envi- 
ronments where a server has to face high rush of electronic commerce 
transactions that involve multiple participants. Also in the Internet 
environment, sites fail frequently and messages take longer delivery 
time. In this situation with extra hardware, the BC protocol reduces the 
blocking problem without involving expensive communication cycle as 
compared to 3PC. Through simulation experiments it has been shown 
that the BC protocol exhibits superior throughput and response time 
performance over the 3PC protocol and performs closely with the 2PC 
protocol. 

Keywords Distributed database. Commit protocol. Two-phase commit, 
Three-phase commit, non-blocking protocols, distributed algorithms. 



1 Introduction 

The two-phase commit (2PC) [7] protocol (or its variation) is widely employed 
for commit processing in distributed data base systems (DDBSs). In distributed 

S. Bhalla (Ed.): DNIS 2000, LNCS 1966, pp. 200-215, 2000. 
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database systems (DDBSs), a transaction blocks during two-phase commit pro- 
cessing if the coordinator site fails and at the same time some participant site has 
declared itself ready to commit the transaction. In this situation, to terminate 
the blocked transaction, the participant site must wait for the recovery of the 
coordinator. The blocked transactions keep all the resources until they receive 
the final command from the coordinator after its recovery. Thus, blocking phe- 
nomena reduces the availability of the system. To eliminate this inconvenience, 
three-phase commit (3PC) protocol [14] was proposed. However, 3PC involves 
an additional round of message transmission to achieve non-blocking property. 
If 3PC is employed to eliminate the blocking problem, an extra round of mes- 
sage transmission further reduces the system’s performance as compared to 2PC. 
Especially in DDES environments, in which frequent site failures and longer mes- 
sage transmission times occur, neither 2PC with blocking problem nor 3PC with 
performance degradation problem are efficient for the commit processing. 

In this paper, we propose a backup commit (BC) protocol by including 
backup phase to the 2PC protocol. In this, multiple backup sites are attached to 
a coordinator. After getting the responses from all the participants in the first 
phase, the coordinator communicates the final decision only to its backup sites 
in the backup phase. Afterwards, it sends the final decision to the participants. 
When blocking occurs due to a failure of coordinator site, the participant sites 
consult backup sites of the coordinator to resolve the blocking. Having small 
amount of overhead (the time required to communicate with the backup site) 
over 2PC, the BC protocol resolves the blocking in most of the coordinator’s 
failures. As compared to 3PC, it reduces both the number of messages and the 
latency that occurs during second phase, thus exhibits superior performance. 
However, in the worst case, blocking still occurs with the BC protocol if both 
coordinator and all of its backup sites fail simultaneously. If such a rare case hap- 
pens, the participants wait until the recovery of either coordinator or its backup 
site. The BC protocol suits best for DDES environments (Electronic commerce) 
in which the longer transmission delays and frequent site failures occur. 

Recently commit processing has attracted strong attention due to its effect on 
the performance of the transaction processing. In [II], using simulation model, 
it has been shown that distributed commit processing can have more influence 
than distributed data processing on the throughput performance. It has been 
shown in [13] that the time to commit accounts for one third of transaction 
duration in a general purpose database. In [3], experimental studies have been 
reported on the behavior of concurrency control and commit algorithms in the 
wide area network environments. It has been shown that the time to commit 
can be as high as 80 percent of the transaction time in the wide area network 
(Internet) environments. In [11,3] it has been reported that as compared to the 
2PC protocol, the performance is further degraded with the 3PC protocol due 
to an extra round of message transmission. To reduce the extent of blocking, 
quorum-based 3PC protocol [14] was proposed that maintains the consistency 
in spite of network partitions. In [9], enhanced 3PC protocol is proposed which is 
more resilient to network partitioning failures than quorum based 3PC protocol. 




202 P.K. Reddy and M. Kitsuregawa 



In order to deal with the failure of the coordinator, backup processes are used in 
SDD-1[6]. These processes are initiated by the coordinator before initiating the 
commit protocol and substitute the coordinator in case of its failure. In order 
to ensure that only one process will substitute for the coordinator, backups are 
linearly ordered, so that the first one “looks” at the coordinator, the second 
“looks” at the first one, and so on. If one backup fails, say k, backup k+1 starts 
looking at backup k-1. “Looking” in this means periodically sending control 
messages. In this, the commit protocol with backups consists of four phases. 
In first phase, the coordinator establishes n linearly ordered backups and each 
backup is informed of participants identity. In second phase, the coordinator 
sends the updates to participants. In third phase, the coordinator communicates 
its decision to backups. And in fourth phase the coordinator sends its decision 
to participants. 

In this paper we use the notion of backup site similar to the notion of backup 
process in [6]. However, in the BC protocol, in case of coordinator’s failure the 
backup site does not assume the role of coordinator. Also, there is no periodic 
exchange of control messages between the coordinator and corresponding backup 
site. Instead, the sites themselves resolve the blocking by contacting the backup 
site. Both coordinator and corresponding backup site are failure independent. 
The work is motivated by the fact that 2PC is widely applied protocol in com- 
mercial database systems. Even though 3PC protocol eliminates blocking, it has 
not entered into commercial database systems. In this situation, we have made 
an effort to resolve the blocking problem of the 2PC protocol by employing ex- 
tra hardware. In addition, the proposed protocol can easily be integrated with 
existing 2PC implementations. 

The BC protocol suits best for World Wide Web (or Internet) environments 
[18] where a server runs electronic commerce transactions by involving multiple 
participants. In an Internet environment, sites fail frequently and a message 
takes a longer delivery time. In this situation by adding extra hardware, the 
BC protocol improves the performance of commit processing by reducing the 
blocking problem as compared to 3PC. Through simulation experiments it has 
been shown that the BC protocol exhibits superior throughput and response 
time performance over the 3PC protocol and performs closely with the 2PC 
protocol. 

The proposed protocol is an extension of the protocol proposed in [8]. In 
[8] , a backup commit protocol has been proposed in which only one backup site 
is attached to a coordinator. In this paper, we present a generalized version 
of the the BC protocol in which multiple backup sites can be attached to a 
coordinator to reduce blocking problem without affecting performance. We have 
also extended the analysis and discussion accordingly. 

The paper is organized as follows. In section 2, we explain the system model. 
In section 3, we briefly explain both 2PC and 3PC protocols. In section 4, we 
propose BC protocol. In section 5, we discuss the performance issues. In section 6, 
we present the results of simulation experiments. The last section Anally consists 
of summary and conclusions. 
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2 System Model 

The DDES consists of a set of data objects. A data object is the smallest ac- 
cessible unit of data. Each data object is stored at one site only. Transactions 
are represented by Ti, Tj,....; and sites are represented by Si, Sj,. . where, i, j 
. . . are integer values. The data objects are stored at database sites connected by 
a computer network. Each database site Si has the backup site and is represented 
by BSi- The originating site of Ti acts as a coordinator in T^’s commit process- 
ing. The sites which are involved in the processing of Ti, are called participants 
of Ti. The coordinator site of Ti acts also as a participant site. 

Each site works as both transaction manager and a data manager. The trans- 
action manager supervises the processing of transactions, while the data man- 
agers manage individual databases. We assume that the network is prone to both 
link and site failures. When a site fails, it simply stops running and other sites 
detect this fact. Also, we assume that the network partitioning can occur. That 
is, in the event of network partitioning failure, the operational sites are divided 
into two or more groups where every two sites with in a group can communicate 
with each other, but the sites in different groups can not. The communication 
medium is assumed to provide the facility of message transfer between sites. 
When a site has to send a message to some other site, it hands over the mes- 
sage to the communication medium, which delivers it to the destination site in 
finite time. We assume that, for any pair of sites Si and Sj, the communication 
medium always delivers the messages to Sj in the same order in which they were 
handed to the medium by Si. 

3 Distributed Commit Protocols: 

2PC and Quorum Based 3PC 

In the literature, a variety of commit protocols have been proposed, most of 
which are based on the 2PC protocol. The most popular variants of the 2PC 
protocol are presumed abort and presumed commit protocols [10]. The other 
protocols include early prepare [13], coordinator log [16], and implicit yes vote 
[2] protocols. Also, different communication paradigms can be used to implement 
the 2PC protocol. The one described below is called the centralized 2PC, since 
the communication is between the coordinator and the participants only. In this 
process, the participants do not communicate among themselves. In this paper, 
we do not discuss other protocols since these are not concerned with the blocking 
problem. 

The detailed explanation of termination and recovery protocols for both 2PC 
and 3PC protocols against failures such as coordinator timeouts, participant 
timeouts, and participant failures can be found in references [17,?]. In this sec- 
tion, we briefly explain 2PC and quorum based 3PC protocols. 




204 P.K. Reddy and M. Kitsuregawa 



3.1 Two-Phase Commit Protocol 

In DDBSs, 2PC extends the affects of local atomic commit actions to dis- 
tributed transactions by insisting that all sites involved in the execution of 
a distributed transaction agree to commit the transaction before its effects 
are made permanent. A brief description of the 2PC protocol that does not 
consider failures is as follows. The coordinator (originating site of a transaction) 
writes a begin-commit record in the log, sends a PREPARE message to all 
participating sites, and enters the wait state. When a participant receives 
PREPARE message, it checks if it can commit the transaction. If so, the 
participant writes a ready record in the log, sends a VOTE .COMMIT message 
to the coordinator, and enters the ready state. Otherwise, the participant 
writes an abort record and sends a VOTE. ABORT message to the coordinator. 
If the decision of the site is to abort, it can forget about that transaction. The 
coordinator aborts the transaction globally, even it receives VOTE. ABORT 
message from one participant. Then, it writes an abort record, sends a 
GLOBAL.ABORT message to all participant sites, and enters the abort state; 
Otherwise, it writes a commit record, sends a GLOBAL.COMMIT message 
to all participants, and enters the commit state. The participants either 
commit or abort the transaction according to the coordinators’ instructions 
and sends back ACK (acknowledgment) message at which point the coordina- 
tor terminates the transaction by writing an end-of-transaction record in the log. 

Blocking problem 

In the 2PC protocol, consider a situation that a participant has sent a 
VOTE. COM MIT message to the coordinator and has not received either 
GLOB AL.COM M IT or GLOB AL. ABORT message due to the coordinator’s 
failure. In this case, all such participants are blocked until the recovery of the 
coordinator to get the termination decision. 

Partitioning failnre 

Consider that a simple partition occurs, dividing the sites into two groups; the 
group which contains the coordinator is called coordinator group; the other is 
participant group. In the 2PC protocol, for the coordinator this case is equiv- 
alent to the participants’ failure. After the time-out period, the coordinator 
terminates the transactions by following termination protocols. However, for the 
participants in the participant group it is equivalent to the coordinator’s failure. 
So, they wait until the partition is repaired to know the termination decision. 
Thus, the 2PC protocol terminates the transactions consistently in case of par- 
titioning failure. 

3.2 Quorum Based 3PC Protocol 

The blocking problem is eliminated in the 3PC protocol. The brief description 
of quorum based 3PC protocol is as follows. 
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Every site in the system is assigned a vote Vi. Let us assume that the total 
number of votes in the system is V, and abort and commit quorum are Va and 
Vc, respectively. The following rules must be obeyed by the protocol. 

1. Va + Vc> V, where K > 0, K > 0. 

2. Before a transaction commits, it must obtain a Commit quorum Vc- 

3. Before a transaction aborts, it must obtain an Abort quorum Va- 

The abort case is similar to the 2PC protocol; the coordinator aborts the 
transaction globally, even if the coordinator receives VOTE^ABORT message 
from one participant. However, the commit case is different. If the coordina- 
tor receives all VOTEJJOMMIT messages, it writes a prepare Jo ^commit 
record, sends PREPARE_TOJJOMMIT message to all the participants, 
and enters a new pre.commit state. On receiving this message, the par- 
ticipant writes a prepare Jo ^commit record, sends READY JTO -COM MIT 
message, and enters pre_commit state. Finally, when the coordinator 
receives READY _TOJJOMMIT messages, if the sum of the votes of re- 
sponding sites equals to or exceeds Vc, after writing a commit record, it sends 
GLOBALJJOMMIT message to all participants, and enters the commit state. 

Elimination of blocking 

We briefly explain how the 3PC protocol eliminates blocking problem by dividing 
the situation into two cases. First, a participant has sent the VOTEJJOMMIT 
message but has not received 

PREP ARE -TO -COM MIT message due to the coordinator’s failure. Second, 
a participant has received PREPAREJTO-COMMIT message but has not 
received GLOBALJJOMMIT message due to the coordinator’s failure. 

In both cases, the operational participants elect a new coordinator. The 
new coordinator collects the states from all the sites, and tries to resolve the 
transaction. If any site has previously committed or aborted, the transaction 
is immediately committed or aborted accordingly. Otherwise, the coordinator 
tries to establish a quorum. The coordinator commits the transaction if at 
least one site is in the pre-Commit state and the group of sites in the wait 
state together with the sites in the pre-Commit state form a Commit quorum. 
The coordinator aborts the transaction if the group of sites in the wait state 
together with the sites in the pre-abort state form an Abort quorum. 

Network partitioning 

Quorum based 3PC protocol is resilient to partitioning failure. When a net- 
work partitioning occurs, each partition elects a new coordinator assuming that 
other sites are down. In order to ensure that the same decision is reached by 
all coordinators, a coordinator must explicitly establish a Commit quorum (Vc) 
to commit, or an Abort quorum (Va) to abort. Otherwise, they wait until the 
merger of partitions. In this way consistency is achieved. 
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4 Backup Commit Protocol 

In this section, we first propose the BC protocol. For the BC protocol, same 
termination and recovery protocols of the 2PC protocol against different types 
of failures (coordinator timeouts, participant timeouts, and participant failures) 
can be employed. However, the blocking case is different. Next, we explain the 
termination and recovery protocols in case of blocking. Subsequently, we discuss 
the behavior of the BC protocol in case of partitioning failures. 

4.1 Protocol 

Suppose there are n sites in DDBS. In this approach, for each site Si, k backup 
sites BS{i,j){j = 1 . . .k) are attached. A backup site is failure independent and 
connected with coordinator in shared nothing mode. Let backup sett be a set 
of identity of backup sites of Si. The first and third phases of the BC protocol 
are similar to first and second phases of 2PC, respectively. We present the BC 
protocol as follows. 

First phase 

Coordinator : The coordinator writes a begin — commit record in the log, 
sends a PREPARE message along with backupseti information to all 
participating sites, and enters the wait state. 

Participant : When a participant receives PREPARE message, it stores 
the backup-set information, and checks if it can commit the trans- 
action. If so, the participant writes a ready record in the log, sends 
VOTEJJOMMIT message to the coordinator, and enters ready 
state. Otherwise, the participant writes an abort record and sends a 
V OT E _ABO RT message to the coordinator. 

Second phase 

Coordinator : If the coordinator (Si) receives VOTE_COMMIT mes- 
sages from all participants, after writing decided do -commit record on 
its stable storage, sends DECIDED_TOJJOMMIT message to all 
BS{i,j) where j=l to k. 

Otherwise, even if it receives V OT E -ABO RT message from one partic- 
ipant, after writing decideddombort record on its stable storage, sends 
DECIDED_TO-ABORT message to all BS(i,j), where j=l to k. 

Backup site BS{i,j) : If BS{i,j) receives DECIDED_TO-COMMIT 
message, BS{i,j) writes recorded -commit record on the stable storage, 
sends back RECORDED-COMMIT message to Si. 

Otherwise, if it receives receiving DECIDEDATO-ABORT message, 
BS{i,j) (1 < j < k) writes recorded -abort record on the stable storage, 
sends back RECORD ED -ABORT message to Si. 
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Third phase 
Coordinator : 

• On receiving RECORDED JJOMMIT message from any of the BS(i,j), 

where 1 < j < fc, the coordinator writes commit record on the stable 
storage and sends GLOBAL-COMMIT message to all participants. 

• On receiving RECORD ED -ABORT message any of BS(i,j), where 1 < 

j < k, the coordinator writes abort record on the stable storage and 
sends GLOBAL_ABORT message to all participants. 

• The coordinator does not receive any response from the backup sites. Then, 

• If the coordinator is able to confirm the fact that no backup site has 
received RECORDED .COMMIT {RECORD ED .ABORT) mes- 
sage (such as the backup sites are unreachable), it can safely abort 
the transaction. 

• If the coordinator is unable to confirm the receipt the 
RECORDED.COMMIT {RECORD ED. ABORT) message by a 
backup site, it follows the recovery protocol (please see section 4.3) 
for that transaction. 

Participant : The participant follows the coordinator’s instructions, and 
sends back acknowledgment message to the coordinator. 

Coordinator : After receiving acknowledgment messages from all partic- 
ipants, the coordinator writes end.of .transaction record on the stable 
storage. 

4.2 Termination Protocols 

If blocking occurs due to the failure of the coordinator site {Si), the blocked par- 
ticipant contacts the BS{i,j){l < j < k) which are in the backup.set. Note that 
a backup site may be in any one of the four states: contains recorded. commit 
information, contains recorded. abort, contains no.inf ormation, or down. 

In case of blocking, the participant contacts other participants to decide 
resolve the transaction’s fate. If it is unable to get the status of the transaction, 
it contacts all BS{i,j) where j=l to k. 

1. If any one of the backup site contains recorded. commit record, it commits 
the transaction. 

2. If any one of the backup site contains recorded. abort record, it aborts the 
transaction. 

3. If all backup sites are up and no information exists at all backup sites, then 
it waits for the recovery of the coordinator. 

4. If some backup sites are down and no information exists at the up-backup 
sites wait for the recovery of backup sites. After their recovery, one of the 
preceding steps are followed. 



4.3 Recovery Protocol for Coordinator 

When the coordinator recovers from the failure, it may be in one of the following 
states. 
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I The coordinator finds begin^commit record but no decided do ^commit 

record 

In this case, it can safely abort the transaction without contacting the partic- 
ipants. Because, the participants either might have aborted the transaction 
by contacting the backup site or are in the ready state. 

II The coordinator finds decided do -commit record but no commit record 
In this case, there exist three possibilities. First, the backup site might 
have failed before receiving DECIDEDdTOJdOMMIT message from 
the coordinator. Second, the coordinator might have failed after writing 
decided do -Commit record but before sending DECIDED-TO-COMMIT 
message to backup site. And third, the coordinator might have failed after 
the backup site has received DECIDED-TO-COMMIT message. 

— After the recovery, the coordinator contacts the all the backup sites. 
If no information exists about the transaction, the coordinator sends 
GLOBAL-ABORT messages to all the participants. At most, a par- 
ticipant either might have aborted the transaction by contacting the 
backup site or is in the ready state. Otherwise, if the coordinator 
finds decided -to -Commit record at any one of the backup site, it sends 
GLOBAL-COMMIT messages to all the participants. At most, a par- 
ticipant either might have committed the transaction by contacting 
backup site or is in the ready state. 

— After recovery, if the coordinator is unable to contact the backup site, 
there exist two options. First, it waits for the recovery of the all backup 
sites and follows the above recovery protocol. Second, it will ask all the 
participants to report the transaction’s state. In this case, we assume 
that the participant distinguishes the recovery messages from normal 
messages. That is, after responding to recovery messages of coordinator, 
the participant will not contact the backup site in future about corre- 
sponding transaction. Thus, even though the coordinator fails during 
recovery, the re-execution of the recovery protocol makes no difference. 

III The coordinator finds commit record but no end-of -transaction 
record 

In this case, it can safely re-send GLOBAL-COMMIT message to all par- 
ticipants. Because, at most, the participants either might have committed 
the transaction by contacting the backup sites or are in the ready state. 



4.4 Backup Site Failure and Network Partitioning 

Consider a situation that the partition has occurred after sending the 
DECIDED-TO-COMMIT message to backup sites. As a result, the coor- 
dinator does not receive the reply to DECIDED-TO-COMMIT message from 
its backup site. In this case, there exist two possibilities: either the backup site 
may be down or the network has partitioned such that the backup site and 
the coordinator fall in different groups. In this case, if coordinator unilaterally 
either aborts or commits the transaction, inconsistency may occur. Because, if 
partition occurs after receiving DECIDED-TO-COMMIT message by backup 
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site, the participants in the backup site group commit the transaction by con- 
tacting the backup site. Otherwise, if the partition occurs before receiving the 
DECIDEDSrO-COMMIT message by backup site, participants in the backup 
site group abort the transaction by contacting backup site. When the coordina- 
tor fails to get response to DECIDED_TOJJOMMIT message from backup 
site, it follows recovery protocol. Similarly, participants follow termination pro- 
tocols. It can be observed that, recovery protocols and termination protocols 
ensure consistency in case of partitioning. 

5 Performance 

In this section, we first analyze the non-blocking behavior of the BC protocol. 
Next, we discuss overheads and benefits of the BC protocol. 



5.1 Analysis of Blocking 

Reliability of a module is statistically quantified as mean-time-to-failure 
(MTTF). The service interruption of a module is statistically quantified as mean- 
time-to-repair (MTTR). The module availability is statistically quantified as 

MTTE 

MTTE + MTTR 

Let MTTFc and MTTRf, represent MTTF and MTTR of the coordinator 
site respectively. Also, MTTEb represents MTTF of corresponding backup site. 
Since the backup site and the coordinator are failure independent, the probability 
that backup site fails when the corresponding coordinator is down is calculated 
as below. 

The probability that the coordinator site is unavailable is: 

_ MTTRc 
~ MTTEa + MTTR^ 



MTTRc 

MTTFc 



since MTTRc <C MTTFc 



The probability that the backup site fails is: 



Pb 



1 

MTTEb 



The probability that k backup sites fail and the corresponding coordinator 
is down is: 



1 ^ MTTRc _ MTTRc 

{MTTFb)^ ^ MTTFc ~ {MTTFt)^ x MTTFc 
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From above equation, it can be observed that the probability that k backup 
sites fail while corresponding coordinator is down is reduced significantly. Thus, 
in case of coordinator site failure, with the introduction of the backup sites, 
blocking probability is significantly reduced as compared to 2PC. 

Further, it can be observed that the purpose of the backup site is to termi- 
nate the blocked transactions at the participant sites when the corresponding 
coordinator is down. After the termination of the blocked transactions, even 
though the backup site fails, it does not effect the consistency of the database. 
Let termdime be the time duration required to terminate the blocked transac- 
tions by contacting the backup site when the coordinator is down. The above 
equation denotes the probability that the backup site fails during the entire pe- 
riod (MTTRc) when the coordinator is down. However, in the worst case the 
blocked transactions are consistently terminated even if the backup site is up 
only during term-time and then fails. As termdime (few minutes) is much less 
than the down time (few hours) of the coordinator, the probability that the 
backup site fails during the terradime while the coordinator is down is further 
reduced to infinitesimal. 

5.2 Message Overheads, Latency and Recovery 

As compared to 2PC, to commit a transaction, BC requires extra messages and 
time duration (to communicate with the backup sites). However, as compared 
to 3PC, independent of number of participants, BC requires fixed time duration 
during the second phase. 

In BC, the latency during the second phase is considerably reduced as com- 
pared to 3PC. Also, by making the nearby site to the coordinator as one of the 
backup site, the latency can be minimized (the coordinator communicates with 
the backup sites in parallel). This brings the performance of the BC close to 
2PC by achieving non-blocking property in most of the coordinator failures. 

Also, in BC, the overhead during the recovery is considerably reduced. Be- 
cause, after recovery, the coordinator terminates the transactions consistently by 
only contacting the backup sites. However, in a rare case, if it is unsuccessful in 
contacting the backup sites, it has to demand the state information from all the 
participants. Also, in case of partition failures, BC, terminates the transactions 
consistently with the blocking problem, similar to 2PC and 3PC. 

6 Simulation Experiments 

We have carried out the simulation experiments to compare the performance of 
BC with both 2PC and 3PC. 

The meaning of each model parameter for simulation is given in Table 1. 
The size of the database is assumed to be dbsize data objects. The database 
is uniformly distributed across the all numsites sites. The new transaction is 
assigned the arrival site which is chosen randomly over numsites. The param- 
eter trans-size is the average number of data objects requested by the trans- 
action. It is computed as the mean of a uniform distribution between maxsize 
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and minsize (inclusive). The probability that an object read by a transaction 
will also be written is determined by the parameter writejprob. The parameter 
transdime is the time required to transmit a message between sites. The pa- 
rameter backup J,r an s dime is the time required to transmit a message between 
the coordinator and the corresponding backup site. The parameter local dodotal 
is the ratio of the number of local requests to the number of total requests for 
a transaction. The parameter res Jo is the amount of time taken to carry out 
i/o request and the parameter res-cpu is the amount of time taken to carry out 
CPU request. Accessing a data object requires res-cpu and res Jo. Also, each 
message of 2PC protocol requires res-cpu and res Jo for processing. The total 
number of concurrent transactions active in the system at any time is specified 
by the multiprogramming level (MPL). 

The communication network is simply modeled as a fully-connected network. 
Any site can send messages to all the sites at the same time. The wide area net- 
work behavior is realized by varying transdime. This is true because, in DDBSs, 
even though the difference in the delay to receive responses for remote requests 
varies considerably, the transaction does not complete its execution, unless it 
receives responses from all the remote sites. We employ static distributed two- 
phase locking algorithm for concurrency control. 

The setting of res Jo, res-cpu, dbsize, transsize, minsize and maxsize 
values are given in Table 1 which are adopted in [1]. The parameter local dodotal 
ratio for a transaction is fixed at 0.6 [5]. Thus, 60 percent of the data objects are 
randomly chosen from the local database and the 40 percent of the data objects 
are randomly chosen from the remaining database sites. A transaction writes all 
the data objects it reads {write zprob is set to 1). With these settings, by varying 
MPL values, sufficient variation in the data contention is realized. 



Table 1. Model parameters with settings 



Parameter 


Meaning 


Value 


db_size 


Number of objects in the database 


1000 


num_sites 


Number of sites in the system 


5 sites 


trans_size 


Mean_size of transaction 


8 objects 


max_size 


Size of largest transaction 


12 objects 


min_size 


Size of smallest transaction 


4 objects 


write_prob 


Pr (write X/read X) 


1 


local_to_total 


local requests / total requests 


0.6 


res.cpu 


CPU time 


15 msec 


res_io 


I/O time 


35 msec 


MPL 


Multiprogramming level 


Simulation variable 


trans_time 


Transmission time between 
two sites 


Simulation 

variable 


backup_trans 

Time 


TransTime to 
contact backup site 


0 





throughput throughput throughput 
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The primary performance metric of our experiments is throughput, that is, 
the number of transactions completed per second. Since we are using a closed 
queuing model, the inverse relationship between throughput and response time 
(as per Little’s law) makes either a sufficient performance metric. 

In all simulation experiments we have set backup drans dime to 0. (Note that 
in the BC protocol, it is suficient for the coordinator to inform the final decision 
to at least one backup site,) That is, we consider that both the coordinator and 
at least one backup site are connected to the high speed local area network. 

At different MPL values, Figur 1 shows the throughput results by setting 
transmission time (trans_time) to 0. This graph shows the behavior of BC pro- 
tocol in the local area network based environments. In DDBS, the increase in 
MPL results in higher resource contention. As a result, more number of trans- 
actions wait for resources. It can be observed that the throughput of BC is little 
less than 2PC protocol. This indicates the overhead of BC protocol over 2PC 
protocol in local area network based environments. 

At different MPL values. Figure 2 shows the throughput results by setting 
transmission time to 1000 msec. Here, wide area network environment is as- 
sumed. It can be observed that, with the increase in transmission time, the effect 
of communication with the backup site is nullified. As a result, throughput curve 
of the BC protocol is close to the 2PC protocol. 

At different transmission time values. Figure 3 shows the throughput results 
by setting MPL to 30. As the transmission time increases, the transaction spends 
longer duration in the commit processing. Consequently, more number of trans- 
actions wait for the data objects for longer duration. As a result throughput 
decreases. It can be observed that due to longer transmission times, the over- 
head of the the BC protocol over the 2PC protocol is nullified. As a result, 
throughput curves of both BC and 2PC coincide. 



7 Summary and Conclusions 

In this paper we have proposed the BC protocol for the commit processing in 
DDBSs that exhibits non-blocking behavior in most of the coordinator failures. 
In this protocol, multiple backup sites are attached to the coordinator site. 
This protocol incurs little overhead (messages and time required to write to 
the backup site) over the 2PC protocol. By selecting the nearby site to the 
coordinator as one of the backup site, the overhead can be nullified. This brings 
the performance of the BC protocol close to the 2PC protocol. The BC protocol 
also preserves the consistency of the database in case of partitioning failures. We 
have analytically shown that the probability that both the coordinator and the 
corresponding backup sites are down at the same time is significantly reduced. 
Also, the simulation results show that the performance of the BC protocol 
is very close to the 2PC protocol. The BC protocol suits well for commit 
processing in DDBS (Internet) environments where frequent site failures occur 
and messages take longer delivery time. 
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Abstract. The growth of Internet is stimulating an increasing number of 
autonomous information systems to cooperate. Information systems are 
heterogeneous and distributed in nature. Their interoperability for cooperation 
is not an easy task to achieve. The diversity of information sources generates 
incompatibilities that need to be surmounted before the cooperation of 
information systems becomes feasible. However, few help deals with the 
cooperation of separately-developed systems in heterogeneous distributed 
environment. The agent-oriented technology can be leveraged to enhance the 
modeling of cooperative information systems by providing a flexible design and 
reusable applications. Furthermore, XML is a powerful tool for data 
representation, storage, and interoperation. This paper describes major design 
goals of an agent-based architecture, which supports the cooperation of 
heterogeneous distributed information systems. It also sketches the 
implementation, which uses XML in the CORBA middleware for showing the 
impact on the cooperation efficiency. Examples are given from supply chains of 
manufacturing enterprises. 



1 Introduction 

The field of autonomous collaborating agents becomes increasingly attractive for 
systems where problem solving and decision making must be distributed. In order to 
avoid the performance problems and scalability issues, we use a distributed agent- 
based approach instead of a centralized one. The term “agent” is currently used in 
many different ways [1]. In this paper, an agent is an abstraction of an Information 
system (IS), a human or a workflow. 

The rapidly-increasing extents of large computer networks, like Internet, World 
Wide Web, and corporate intranets, are due to the need of sharing information and 
resources within and across diverse computing enterprises. Consequently, the system 
cooperation requires now even richer models and mechanisms permitting 
heterogeneous systems to collaborate. However, the few suggested approaches for 
dealing with the cooperation of separately-developed systems In a heterogeneous 
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distributed environment have not led to satisfaction in the industry. This is essentially 
due to the most prominent problem of heterogeneity. The majority of work has 
focused on developing theoretical methodologies [2], [3] or ad hoc architectures 
based on the collaboration among autonomous agents [4], usually homogeneous. In 
order to overcome the crucial problem of heterogeneity, we use the XML (extensible 
Markup Language) technology [5] since it helps developers to build and deploy 
sophisticated Web applications faster. XML allows the agents to understand each 
other. It also hastens enterprise integration efforts so important to supply chains and 
other business collaboration initiatives. 

Furthermore, middleware solutions like CORBA [6] provide security to 
authenticate users. Also, they control accesses to resources and a host of other support 
functions to keep computer networks running smoothly. In these kinds of distributed 
computing applications, the data are transient, transferred between computers, often 
not permanently stored any-where and probably never readable by a human. 
However, if the requirement is to store data for the long-term and extract human- 
readable documents, like in cooperating heterogeneous distributed ISs, then XML 
would be the more appropriate medium. Consequently XML and middleware are 
complimentary, none of them will replace the other. We use these two technologies 
together to support the architecture suggested in this paper. CORBA is chosen as 
middleware while it is a mature technology and a standard one [6]. 

This paper has two goals. First, it proposes an agent-based architecture, which is 
provided as a means for building Cooperative Information Systems (CISs) on top of 
existing systems. This architecture allows one to offer information services that meet 
evolving organizational objectives. The second goal consists of showing how the 
combination of two standard technologies (XML and CORBA) affects the quality and 
efficiency of cooperation. 

The paper is organized as follows: the next section provides an overview of some 
suggested approaches and architectures in CISs. Section 3 presents the proposed 
agent-based architecture supporting the cooperation in heterogeneous distributed pre- 
existing ISs. Section 4 discusses the cooperation process. We describe in section 5 the 
XML technology within the cooperation process whenever agents interchange 
information. Section 6 shows the impact of using XML and CORBA in the 
cooperation efficiency. In the last section, we conclude and present the future work. 



2 Overview of CISs 

Apparently, a lot of people mean different things by the term CISs [7]. This term is 
essentially considered as a support of cooperation through databases, or how to derive 
a structure from data sources that do not always have a known structure, or an 
integration of a social dimension. In our context, this term means cooperation 
between heterogeneous distributed ISs, where some definitions and architectures have 
been suggested. 

In [8], CISs involve the integration of distributed information sources that span 
both the database and knowledge-based system domains. Papazoglou et al. consider 
the community of information agents that jointly execute a common task as a large IS 
called Intelligent and Cooperative Information Systems (ICISs). The described 
architecture calls on techniques and functions that are stemming from the object 
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paradigm and knowledge-based systems. This proposition is more oriented toward an 
internal description of an agent rather than the interactions where the cooperation 
process is developed. 

Aarsten et al [9] have proposed a decentralized agent-based approach built on the 
G-h- pattern. The authors minimize the standard representation required for each 
agent, i.e. the set of capabilities puts less restrictions on the types of agents. These 
restrictions also make agents, which can be integrated into the global system, less 
complex. However a limited set of capabilities can mean fewer possibilities for 
information exchange, and therefore less profitable collaboration. The 
Client/Server/Service pattern permits an agent to perform several services 
concurrently. Each capability interface is mapped to a C-H- abstract class. This relies 
on a multiple inheritance and creates complex inheritance dependencies, which are 
better avoided in a distributed context. 

Dubois [10] considers CISs as federate various heterogeneous sources of data and 
knowledge distributed on wide area networks. His proposal highlights the 
contribution of multi-agent systems to ISs cooperation. The defined architecture is 
based on the combination of objects and agents. The principal idea is to constitute a 
multi-agent system that exploits distributed artificial intelligence techniques to solve 
problems of ISs cooperation. Unfortunately, the author has not focused on interaction 
and communication among agents during collaborative problem solving. 

In all these approaches, the configuration has often been achieved with custom, 
non-portable data formats and mechanisms. The XML-based configuration is portable 
and can be used in a generic fashion. Using XML to convey data in CORBA-based 
system makes the system more flexible. 



3 Agent-Based Architecture 

The agents enhance the organization flexibility since the collaboration they allow is 
not exclusively at a syntactical level. At this level, the organization members (agents) 
commonly agree on a set of data structure definition and on the meaning of the 
operations on those structures. But rather at a semantically level, where the 
organization members communicate in terms of knowledge transfer instead of data 
transfer. The key general characteristics of our agents include autonomy, social 
ability, and pro-activity [1]. The autonomy permits agents to operate without the 
direct intervention of other agents (ISs, human). The social ability is taken into 
account since an agent can interact with other ones. The agents are pro-active in the 
sense that they can exhibit goal-directed behavior by taking the initiative. Our agents 
are also informational because they perform the role of managing, manipulating or 
collating information from many distributed sources. 

Contrarily to other systems, which adopt a hierarchical architecture [10], [11], 
[12], the underlying structure of distributed organization into CISs is flat, and the high 
level control activity is distributed among all the agents. In a centralized agent-based 
approach, the manager agent controls the execution of all the tasks in a given 
organization. Therefore, it is a single point of failure, such that the availability of the 
organization depends on a single site. Consequently, it would be much more adequate 
to use a distributed agent-based approach instead of a centralized one. The balance 
between heterogeneity of ISs and homogeneity provided by the agents has a strong 
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impact on the kind of information that each agent should maintain of the other, and 
therefore on the flexibility on the entire system. 

The suggested architecture is based on two levels: the physical entities level and 
the agent one. 



3.1 Physical Entities Level 

This level accommodates existing information, workflow and other computer-based 
systems. These systems are developed in terms of conventional technologies such as 
programming languages, database management and workflow systems that are 
executing on conventional distributed hardware and software. These systems were 
intended to serve local needs. The behavior of an organization depends on the way 
each system decomposes the assigned task and the collaboration relations that are 
established with other systems of this organization. 



3.2 Agent Level 

A generic agent architecture (see fig. 1) consists of four main components: the 
perception interface, the communication interface, the operation facilities, and the 
repository. 




Fig. 1. A generic agent architecture 



The perception interface allows other agents to get the function descriptions of 
that agent. 

The communication interface manages the communication between the agent and 
the outside world (other agents, physical entities level, and environment). 

The operation facilities hold two sub-components, the central control and 
coordination mechanism. The central control executes different functions such as a 
Graphic User Interface (GUI), goal interpretation, plan change, and eventual 
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participation of this agent in two or more organizations. The coordination mechanism 
allows to achieve the coordinated behavior of an organization. This mechanism is 
based on social constraints [13] that an agent in organization is subject to. These 
constraints are made of organizations that are considered as systems. These systems 
constraint the actions of member agents imposing mutual obligations and 
interdictions, roles played by an agent, goals and plans that it may adopt. 

The repository stores the social constraints, organization historic for an eventual 
reuse, and some other functions like GUI. 



4 Inter-agent Cooperation in Heterogeneous ISs Environment 

The cooperation between the defined agents is made by two stages, the valuation of 
achieving a goal and the execution of the collaboration process. At the first stage, the 
communication is only done between the agents. The goal assigned to an agent is first 
interpreted by this agent called the leader. It is important to the leader to fully 
recognize the goals, characteristics, and principles of business system functioning. 
The interpretation allows the leader to elaborate the strategic planning of the global 
activity, which is inspired from the Brumec et al methodology [14]. The result of 
applying this methodology is should be a document project that contains the business 
processes and data, the technical resources, and the development activity plan. Then 
the leader negotiates with other agents that attempt to satisfy some sub-goals with 
some attributes, whose values model the quality of service. The interest of this 
negotiation is to optimize the use of the mutual resources and capabilities. The 
queried agents that may involve communication with other ones reply in the 
affirmative (without exceed the reply delay) if they support the sub-goals with the 
requested attributes or with other options. Once the best offer from each agent is 
given, the leader evaluates if the global goal can be realized with the required needs in 
terms of costs, delivery time, and other performances. 

At the second stage, the communication is done in two ways. In the first way, the 
communication is only done inter-agents while in the second one, it is done between 
the agents and their underlying physical entities. These two communication ways are 
described in more detail in section 6. The leader manages the collaboration process by 
ensuring the coordination of tasks among agents (organization members). The leader 
supervises the coordination by taking into account the member’s commitments that 
have been established during the negotiation of the valuation stage. 



5 Representation of Structured Information Using XML 

We start by briefly describing our application domain, which consists of the 
cooperation of supply chains of manufacturing enterprises. A supply chain is a 
globally extended network of suppliers, factories, warehouses, distribution centers 
through which raw materials are acquired, transformed into products, delivered to 
customers, serviced and enhanced. The different local factories are seen as cells 
where each one operates to achieve its best advantages and cooperates with other 
cells. The main goal of this cooperation is to improve the overall performance of the 
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whole system in terms of quality, costs, and time response. The key to the efficient 
operations of such a system is the tight coordination among its components. This 
coordination is explained as shown below by the social constraints like organizations, 
roles, obligations, interdictions, and goals. 

The agents share goals and/or resources. So, many forms of data are exchanged. 
The encountered difficulty whenever we try to represent a structured information 
using XML is that the obtained XML document must be well-formed and valid. 
Hence, we give some examples from the application described above about social 
constraints, which are exchanged between the agents. These examples show how to 
take advantage of XML technology in cooperating heterogeneous distributed ISs. 



5.1 XML Conceptual Example 

XML is a document description meta-language. It is designed as a subset of SGML 
[15] where the separation of content from its presentation is a key concept inherited 
from SGML and recommended by W3C (World Wide Web Consortium). It is viewed 
as an extension to HTML, providing many more programming tags that can be used 
to format, structure, and search upon information contained in an application. 

The collaboration requires agents to have homogeneous aspects in order to be able 
to understand each other. However, the existing ISs in the industry are intrinsically 
heterogeneous. Therefore, XML and some related technologies will make it easier for 
companies to develop distributed heterogeneous applications that can exchange 
formatted data. 

The social constraints are exchanged during the negotiation protocol between a 
leader and other agents that may be members of an organization. This kind of 
structured information must be represented in the same format, in order to be 
interpreted by all the participated agents in and across organizations. XML that is fast 
becoming the standard for data interchange on the web is used to represent this 
information. 

Among the social constraints, we present the organization entity that has a goal to 
achieve, where some constraints and optimizations are associated to this goal. An 
organization consists also of a set of roles filled by a number of agents. Let us 
consider an example of an organization expressed as a XML document. 

Example of a Computer Program specifying an organization entity using XML 

<?xml version= ' 1 . 0 ' encoding= 'us-ascii ' ?> 

<organization name -org=" mil ling machines production" > 
<goal name-goal="to produce 50 milling machines" > 
<constr- goal >delay</constr- goal > 
<constr-goal>dimension< / constr-goal> 
<optim-goal>time</ optim-goal> 
<optim-goal>quality</optim-goal> 

<theoriesxrule rulel = "shop 

trial " >T1</ rulex/ theories > 

</goal> 
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<agent name-agent="milling machines/turning cell"> 
<roles>requirements plan</roles> 
<roles>machining< /roles > 

<roles>assembling< /roles > 

</ agent > 

<agent name-agent="purchase cell"> 

<roles>supplying< /roles > 

</agent> 

< / organ! z at ion> 

In the example above, the goal of “milling machines production ” organization is 
to “produce 50 milling machines’’. The constraints associated with this goal are 
“delay” and “dimensions” . The execution of the goal may be optimized for “time” 
and “quality”. Finally, to reason about whether constraints are satisfied or not, theory 
“Tl” can be used. “Computation, manufacturing, and assembly” are roles filled by 
milling machines/turning agent. “Supplying ” is role filled by purchase agent. 

The “organization” XML document is well-formed because it follows rules 
defined by the XML specification. 



5.2 Validity of the XML Document 

After the XML declaration, the document prolog can include a Document Type 
Definition (DTD) [5], which specifies the kinds of tags that can be included in the 
XML document. In addition, the validating parser (XML processor) shows what tags 
are valid, in what arrangements, and where text is expected. The formal definition of 
the “organization” XML document (presented in the example above) using a DTD is 
shown in the example below. 

Example of a Computer Program showing the DTD for the “organization” XML document 

<?xml version= ' 1 . 0 ' encoding= 'us-ascii ' ?> 

<!ELEMENT organization (goal+, agent+) > 

<!ATTLIST organization name-org CDATA #REQUIRED> 

<! ELEMENT goal (constr-goal* , optim-goal* , theories) > 
<!ATTLIST goal name-goal CDATA #REQUIRED> 

<!ELEMENT constr-goal (#PCDATA) > 

<!ELEMENT optim-goal (#PCDATA) > 

<! ELEMENT theories (rule + ) > 

<!ELEMENT rule (#PCDATA) > 

<!ATTLIST rule rulel CDATA #REQUIRED 

rule2 (AND I OR I IMPLIES) #IMPLIED 
rules CDATA #IMPLIED> 

<! ELEMENT agent (roles+) > 

<!ATTLIST agent name-agent CDATA #REQUIRED> 

<!ELEMENT roles (#PCDATA) > 
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We can verify that the “organization” XML document is valid because it matches 
its DTD. 



6 XML and CORBA Supporting Inter-agent Collaboration 

The suggested collaboration process does not follow the Client/Server model but the 
peer-of-peer one. Each agent may be Client or Server. These two notions are rather 
considered as roles. For instance, in an organization a member can become a leader. 
Then it changes the Server role by the Client one. 

When the requirement is to exchange data between cooperating computer systems, 
there are other more efficient ways of defining and storing data. Traditionally, these 
definitions of data formats for machine communication are called Interface Definition 
Languages (IDLs) [6]. IDLs are used for specifying the interfaces between 
cooperating computer applications. They define smaller packets of transient, machine 
readable data that are exchanged between the heterogeneous components of a 
distributed organization. On the contrary, XML is used for the long-term storage of 
human readable data, but which it will be useful to manipulate by machine. 

The interface perception of an agent is described with the IDL of CORBA. It 
contains the specification of capabilities (information processes) of this agent. This 
interface is implemented in Java, which consists of calling methods of the underlying 
physical entity. When a leader (Client) asks an agent (Server) for collaboration, if the 
agent agrees it answers by sending a XML document (as shown in fig. 2). In this case 
the leader must include the Client Java program in its communication interface, while 
the member must include the Server and IDL Java programs in its own 
communication interface. 




Fig. 2. XML and CORBA supporting inter-agent collaboration 



The leader throws the execution of the collaboration protocol when it insures that 
the global goal could be achieved. At this stage, the two ways of communication 
(between the leader and its collaborators, or between an agent and its underlying 
physical entity) are supported by the CORBA middleware. 
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6.1 An Example of Using XML and CORBA During the Negotiation Protocol 

We show through an example how the leader (Client) of an organization asks one 
agent for collaboration indicating the capability it is interested in. The capability 
consists of a defined goal with some conditions and optimizations. The agent (Server) 
returns a XML document where it precise if it accepts, refuses, or proposes new 
conditions. This negotiation protocol must go through the development of four stages: 

- definition of interfaces of the member agents (Servers), 

- implementation of these interfaces in Java, 

- writing of the Java programs that instantiated the member agents, and 

- writing of the leader (Client) Java programs that invoke calls to member agents. 

In order to establish a balance between the heterogeneity of ISs and the 
homogeneity of agents, we show through the two first stages how XML is used in the 
CORBA middleware. 



Interface Specification of the Member Agents. Before a Client can make 
requests on an object, it must know the types of operations supported by the object. 
An object’s interface consists of a set of named operations and their parameters. The 
interfaces for objects are defined with using the OMG IDL [16]. It is required that all 
OMG services must be specified using a declarative language emphasizing the 
separation of interface and implementation. The syntax of CORBA IDL is derived 
from C++, removing the constructs of an implementation language and adding a 
number of new keywords required to specify distributed systems. Let us give an 
example of OMG IDL interface definition. 

Example of a Computer Program specifying the interface perception of the member agents 
using IDL 

module myModule 

{ 

interface myinterface 

{ 

string satisfyGoal ( inout string goal, inout 
string conditions, inout string optimizations ) ; 

}; 

}; 

The definition presented in the example above specifies an interface named 
myinterface that supports one operation, satisfyGoal. This operation takes 
the conditions and optimizations parameters that are associated to the goal 
to satisfy, and returns a text of type string. This text consists of a XML document in 
order to be interpreted by the Client (leader). Therefore the agents understand each 
other by exchanging XML documents. 
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Remark. In the example above, we are not interested in different tools that can be 
associated with the XML document, like extensible Stylesheet Language (XSL) and 
Cascading Style Sheets (CSS) [5] that show how the document shonld appear and can 
be written in several languages. 



Interface Implementation of the Member Agents. As mentioned above, OMG 
IDL is jnst a declarative language. It is not a full-fledged programming language. 
Also, it does not provide features like control constrncts, nor it is directly nsed to 
implement distribnted applications. Instead, the language mappings determine how 
OMG IDL features are mapped to the facilities of a given programming language. 
The OMG has standardized language mappings for some languages like C-H- and Java 
[16]. The implementation of the interface of a member agent provides the code of 
each method. 

Example of a Computer Program showing the implementation of IDL in Java 

package myModule; 
import org . omg . CORBA . * ; 

public class myinterf ace_impl extends 

_mylnterf acelmplBase 

{ 

public String satisfyGoal ( StringHolder goal, 
StringHolder conditions, StringHolder optimizations ) 

{ 

System, out .println ( "The transmitted goal is: " -r 

goal. value ); 

System. out .println ( "The required conditions are : 
" + conditions . value ); 

System. out .println ( "The wished optimizations are 
: " + optimizations . value ) ; 
return "the reply is : " -i- answer + 

"<?xml version= ' 1 . 0 ' encoding= 'us-ascii ' ?>" + 
"<goal myGoal=" + goal. value + ">" + 

" <conditions> " + conditions . value -i- 
" </conditions> " + 

" <optimizations> " + optimizations . value -i- 
" </optimizations> " + "</goal>"; 

} 

} 

The myinterf ace_impl Java class presented in the example above 
implements myinterf ace interface presented in the IDL specification program. It 
extends the _mylnterf acelmplBase class which is generated by the interfaces 
compiler in myinterf acelmplbase . Java file. The code of satisfyGoal 
method shows that a member agent returns a XML document where it precise if it 
accepts, refuses, or proposes new conditions and/or optimizations to the leader agent. 
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7 Conclusion and Future Work 

This paper has described a suitable agent-based architecture for managing large and 
complex data that are distributed and heterogeneous in nature, and supporting 
cooperative use of legacy ISs. The control and knowledge that are distributed among 
agents guarantee flexibility and support reuse. The social constraints are necessary 
components of an organizational behavior allowing agents to execute coordinated 
actions. 

We argue that the combination of XML and CORBA technologies can solve the 
heterogeneity real problem of distributed ISs better and faster than was possible 
before. The use of XML as a data exchange format between agents guarantees the 
homogeneity of agents. We have shown how CORBA enables to create sophisticated, 
distributed object systems on heterogeneous platforms. While XML allows users to 
transmit structured information within, between and out of those systems. XML and 
CORBA are important in their own right, used together, they offer ClSs valuable 
synergies. Also XML provides a portable data format that nicely complements Java’s 
portable code. 

The suggested architecture provides facilities that enable computer and human 
agents to inter-operate across disparate technologies and systems, and to collaborate 
across space and time. However, the notion of cooperation cannot be fully addressed 
unless the goals and desires of agents are explicitly modeled and reasoned about. 
Therefore, future investigations will go into extents of this work by integrating 
explicit modeling and reasoning on the mental states of agents [17]. These extents 
will help to regulate the conflict problem during the negotiation protocol and to fulfill 
the expected goals. Also, semi-structured data can be shared in organizations. The 
structure of this kind of data changes rapidly, or it is not fully known when the data is 
collected. This situation is actually isomorphic to the data structures described by 
XML. So it will be interesting to represent this typical structure with XML 
technology. 
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Abstract. In a continuously expanding mobile and wireless networking 
environment, mobile IP is the preferred standard responsible for provid- 
ing Internet connectivity to roaming mobile nodes. Under certain con- 
ditions of traffic, such as in mobile IP networks supporting multimedia 
applications, overhead can cause delays at the mobility agents, i.e. for- 
eign and home agents. While the mobile IP standard does not exclude 
the possibility of using multiple home agents, it does not impose any 
particular model either; therefore using only one home agent in a clas- 
sic home network configuration can significantly bottleneck the IP data 
packet streams destined to the mobile nodes. In this paper, we develop 
a new simulator in order to evaluate the performance of a load balanced 
multiple home agents protocol extension, as well as several results ob- 
tained during a comprehensive simulation study of the system. We study 
several dynamic load balancing policies that distribute the traffic among 
several nodes in the home network, comparing the results obtained under 
different traffic shapes. We introduce a more realistic double-threshold 
load balancing policy, and compare its behaviour with that of other dy- 
namic/static policies. Using simulation, we also analyze the impact of 
modifying the number of load balanced home agents on the overall sys- 
tem performance. The results show that some load balancing policies 
expected to perform better than others occasionally have an opposite 
behavior. This proves out to be highly dependent on the traffic pattern, 
especially its degree of burstiness. 



1 Introduction 

In this paper, we study a load balanced multiple home agents architecture for 
mobile IP, the Internet Protocol extension designed to provide proper mobility 
support. Mobile IP [II] has become a constant concern of a dedicated IETF 
(Internet Engineering Task Force) group and is aiming towards an IE standard. 
The standardization efforts have materialized in several RFC documents for 
both IPv4 and IPv6 mobile IP extensions. In this paper we focus on IPv4 case, 
although the load balancing policies we evaluate can also be used in the case of 
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IPv6. There are many mobile IP implementations worldwide, as the international 
community is heading towards a significant increase of the number of potential 
mobile terminal users (estimated 1 billion users by 2003) for whom mobility 
support must be provided. 

In mobile IP networks, each mobile node has a fixed IP home address be- 
longing to the home network. The home network is unique to a given mobile 
node, having its prefix matching the home address of the mobile node. While 
roaming away, the mobile nodes receive packets addressed to their home address 
and forwarded to their present location by a fixed node residing in the home net- 
work called home agent. According to [11], a home agent is a router on a mobile 
node’s home network, which tunnels datagrams for delivery to the mobile node 
when it is away from home, and maintains current location information for the 
mobile node. 

A home network can host several, in fact a theoretically unlimited number of 
mobile nodes. While these nodes are away, one single home agent must forward 
IP datagrams, maintain caches and manage registration messages for all the 
roaming nodes. This could generate problems from at least two points of view: 
robustness and performance. 

First, a single home agent servicing a large number of mobile nodes would 
disconnect all of them in case of hardware failures, and that is not desirable 
especially in critical environments such as military applications. Chambless & 
Binkley [7] proposed a multiple home agents protocol extension (HARP - “Home 
Agent Redundancy Protocol” ) based on a pair redundant scheme. Their concern 
is primarily the system robustness, and their model is designed to appropriately 
service all the mobile nodes, by varying the degree of redundancy, when one 
or even several home agents crash. In [6], another agent redundancy scheme is 
proposed and implemented. Their primary goal is fault tolerance, although some 
static load splitting is also provided. 

From the performance point of view, one home agent is a potential bottleneck 
in case of significant delay in delivering the encapsulated data packets caused 
by IP datagrams piled up in its waiting queue. Jue & Ghosal [1] proposed a 
protocol extension that allows dividing the mobile nodes between several home 
agents, providing also dynamic load balancing policies. The purpose of their 
simulation study, however, was to validate the numerical results obtained using 
their analytical model rather than focusing on the real case, and therefore it 
has certain limitations and restrictions. The results presented in their paper are 
limited to small values of the main parameters (number of sources and number 
of servers), thus showing little conformance to a real situation. 

In our paper we present a simulator for studying the performance of mul- 
tiple home agents networks, and the behaviour of the load balancing policies 
described in [1]. We also analyze a more realistic load balancing policy based on 
the emphdouble threshold scheme proposed by Shiravatri & Kruger [5] , and give 
a simulation-based comparison between several system configurations in order 
to reach a compromise between price and performance. This price/performance 
ratio is important especially at the stage where mobile support is added by an 
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ISP (Internet Service Provider), so we also create the conditions to determine 
the optimal number of home agents needed in a given network (given number of 
mobile nodes), delivering packets according to a given traffic workload. 

The results showed in overall that the behaviour of the considered load bal- 
ancing policies is highly dependent on the traffic workload pattern. We noticed 
that a bursty traffic in general determines lower performance (increasing re- 
sponse times for serviced IP packets), but also induces better load balancing 
gain (compared to a static load balancing policy). The stochastic component 
of the traffic generator can also induce significant variations of the data packet 
mean response time. 

In the next section we will briefly introduce mobile IP, the protocols used, the 
multiple home agents extension protocol. Section 3 describes the load balancing 
policies used in our simulator, while the simulator itself is presented in Sect. 4. 
The results are showed in Sect. 5 and, finally, the conclusions and possible future 
work directions are mentioned in Sect. 6. 

2 System Description 

2.1 Basic Features 

Essentially, Mobile IP [2] [3] [II] is a modification and extension of the classic 
IP protocol that enables full Internet connectivity for roaming mobile hosts. 
Supposing we have a mobile node (MN) registered with a local network (Home 
Network - HN), when the mobile nodes changes its physical location and registers 
with a different network (Foreign Network - FN), all the previous connections 
must be terminated and re-initiated because the MN has now a different IP 
address in the FN. While several workarounds are possible (like host-specific 
routing for allowing MNs to keep their IP address while roaming), they are not 
suitable for the Internet and so far mobile IP has proved to be best naturally fit 
to this environment. 

One of the mobile IP routing tasks (Fig. I) consists of tunneling IP packets 
coming from Correspondent Hosts (CH) (any host on the Internet) and destined 
to MN, task achieved by a special entity called home agent (HA) located in the 
MN’s home network. 

An MN is assigned a fixed home IP address in the HN. Whenever the MN is 
attached to the home network, normal IP routing is used to deliver IP datagrams 
from/to it. When the MN registers on a different network (foreign network), it 
negotiates a different IP address (care of address - CO A) with an entity similar to 
HA, called foreign agent (FA) (a co-located care of address can also be used when 
the foreign agent is missing) using protocols outside the scope of this paper. It 
then registers the COA with the HA so that every time an IP datagram addressed 
to MN’s home address arrives in the HN, the HA (using techniques described 
later) receives it on behalf of the “missing” MN, encapsulates and tunnels it 
to the last registered location of the MN. Then the foreign agent receives it, 
decapsulates and forwards it to the MN. This way MN receives the original IP 
datagram unaltered, and any reply it might wish to send to the originating host 
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Fig. 1. Mobile IP 



(CH) can be sent directly using a common routing procedure. This is how the 
“triangle” routing scheme is created. 

There are several proposals regarding the elimination of this triangle scheme 
considered inefficient at least in cases where CH happens to be in the same subnet 
with MN, and both very “far” from MN’s HA. Route Optimization [14] enables 
CH to send datagrams addressed to MN’s COA, at the cost of maintaining 
additional binding caches with temporary locations of MN. This also implies the 
fact that mobile IP is no longer transparent to the source host. 



2.2 Multiple Home Agents 

The main task of a home agent is to receive IP datagrams on behalf of the 
roaming mobile hosts currently registered, and tunnel the encapsulated packets 
to their present location. When the number of serviced mobile hosts raises sig- 
nificantly, it may happen that the HA agent saturates and the packets queue 
up, causing long delays. Jue & Ghosal [1] proposed a protocol extension that 
enables several nodes in the home network to become home agents, sharing the 
workload and improving the overall performance. Their protocol is based on 
gratuitous/proxy ARP extensions used also in mobile IP. 



Proxy /Gratuitous ARP. ARP (Address Resolution Protocol) [12] is respon- 
sible for resolving the IP addresses to lower link layer (MAC) addresses. Each 
host on an Ethernet maintains an ARP table that translates a local IP address 
into the MAC address of a node. 
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Proxy ARP is used when a host answers with ARP replies to ARP queries 
on behalf of another host that is momentarily not connected to the network. 
Gratuitous ARP is typically used in Ethernet when newly configured interface 
devices are broadcasting ARP replies without being queried. Such an action 
determines every host in the subnet to update their ARP caches mapping the 
IP address to the MAC address advertised in the ARP reply. 



The Home Agent Routing Mechanism. While the MN is away, HA sends a 
gratuitous ARP message to all the hosts in the subnet, in particular the router(s), 
which update(s) the ARP cache, mapping the home address of MN to the IP 
address of HA. Thus, all the IP packets addressed to MN’s home address are 
directly sent to the home agent, which further encapsulates and sends them to 
the foreign location. 



Multiple Agents Extension. Jue & Ghosal [1] propose a similar mechanism 
to add multiple agents functionality to the classic model. First of all, all home 
agents must be located in the home network. Each of them maintains an ad- 
ditional binding cache that contains entries for all the roaming MNs currently 
registered with any of the home agents, permanent IP address, temporary IP ad- 
dress, registration timeout and a flag showing if the respective MN is currently 
associated with the home agent owning the binding cache. When a MN sends 
a registration request to one of the home agents (chosen randomly), the home 
agent adds a new entry in its own binding cache table and then broadcasts a 
gratuitous ARP message informing the other HAs that it is currently handling 
the incoming IP packets for that MN. The HAs update their cache entries and 
thus one MN can only be registered to one HA at a time. The scheme allows 
reallocation of MNs between the HAs in such way that the process is completely 
transparent to the MNs side. This allows the implementation of dynamic load 
balancing methods without actually removing packets from the HA queues, but 
by redirecting IP datagrams destined to the same MN according to the policies 
specifications. 

3 Load Balancing Policies 

MNs can be statically assigned to multiple HAs according to a predefined scheme. 
However, it is possible that some MNs receive packets according to different 
traffic patterns, determining rapid increases of some HAs queue sizes, while 
others are almost idle. This could result in an overall poor system performance, 
causing delays in delivering IP datagrams to the mobile hosts. Another solution is 
to use dynamic load balancing policies [4] that strive to keep a balance between 
the queues, but at the cost of some overhead generated by state information 
packets that have to be exchanged among the nodes. 

In our paper we study the effect of several load balancing policies on a simula- 
tion model based on the multiple agents protocol extension previously described. 
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In their paper, Jue and Ghosal [1] performed simulation studies on the multiple 
home agents scheme. Their simulation model, however, had several assumptions 
and restrictions in order to provide compatibility with their analytical model. 
In our simulations we found that the results are highly sensitive to apparently 
insignificant parameter changes, therefore we strive to provide functionality as 
close as possible to the real case. 

We also propose, test and evaluate a double threshold load balancing policy 
based on the scheme proposed by [5] . For the purpose of reference and compati- 
bility with the results obtained by [1], we also evaluate modified versions of the 
same load balancing methods proposed in [1], comparing the results with those 
obtained using our scheme. In the following sections we will describe all the load 
balancing policies considered in our study. 

3.1 Transfer /Selection Pairs 

When the IP packets stream is redirected from the current HA towards a different 
HA (for load balancing purposes), there are two sets of strategies that are to be 
considered. 

One set decides when the up mentioned redirection should occur {transfer 
policy). More specifically, it triggers the transfer when certain local parameters 
reach a predefined value. In this study, timer-based, counter-based and threshold- 
based transfer policies are considered. The timer-based policy keeps a timer Ttj 
at HAi for each mobile node MNj that it is currently serving. When Ttj reaches 
a certain limit, the stream of IP packets corresponding to MNj is transferred to 
another HA. The counter-based policy acts in similar way, except that a counter 
variable Tcj replaces the timer Ttj. Tcj is increased every time an IP packet 
destined to MNj is serviced. Finally, the threshold-based policy keeps a variable 
Tthj counting the number of IP packets in the queue of HAi, destined to the 
same MNj. When Tthj exceeds some limit (threshold), the corresponding stream 
is moved. 

The second set of strategies decides to which HA the stream is redirected. 
Therefore, when an HA decides that it is time for a stream transfer (stream 
consisting of incoming IP packets destined to a mobile node MN) according to 
some local transfer policy, it selects another HA to further handle the stream 
packets addressed to MN. The decision is made according to a selection policy, 
that can be random, round-robin or (simple) threshold. The random policy se- 
lects the next HA randomly, the round-robin policy chooses the HA from a list 
following a round-robin discipline. The threshold policy works by selecting the 
HA with the smallest number of packets destined to the respective MN. 

3.2 Double Threshold Dynamic LB Scheme 

In this section we describe the double threshold based load balancing policy that 
has been tested on our simulator. 

All HAs keep track of the number of packets in the queue using two level 
counters, lower threshold (LT) and upper threshold (UT). Each MN serviced 
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by the HA (in fact all the packets destined to MN) is assigned such a pair of 
counters (LT, UT). Also, each HA maintains a table with all the other HAs and 
their queue sizes (QST). The QST tables are updated (by issuing a local Queue 
Update event) every time a queuing event takes place at any HA. A queuing 
event at HAi occurs every time its queue size changes into one of three possible 
states: empty, normal and full, defined below (see also Fig. 2). 




Fig. 2. Double threshold load balancing 



When the number of packets queued falls under LT, the server enters the 
empty state. Between LT and UT it is said to be the normal state, whereas a 
queue size over UT places the respective HAi in the full state. A home agent 
HAj performs a queue update action by sending a table update packet to all the 
other home agents in the home network. In our study, this packet type has its 
own class and it influences the overall performance. While the mechanisms used 
to implement/maintain/update each home agent’s QST are outside the scope of 
this paper, we suggest that broadcast or multicast techniques could be used in 
such way that the QST update is performed consistently for all the home agents. 

4 The Simulator 

In this section we describe the simulator developed in order to study the behavior 
of the considered load balancing policies. 
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4.1 The Model 

As one of our objectives is to evaluate the multiple HAs system performance in 
the context of using different load balancing policies, our “all-in-one” simulation 
model includes all the features necessary to implement them. 

Our simulator has been developed using Sim-I— 1-, an object oriented simula- 
tion library [10]. It is a discrete event simulator [8] [9], based on an event loop 
that sequentially processes the elements registered in a list according to their 
time tag. 

The simulation model consists of a multiple servers queuing system. All 
servers are considered to be identical, and each one simulates the behaviour 
of an HA and is modeled as a multi-class non-preemptive single queue server. 
There are 3 classes of jobs arriving in each server’s queue. One type consists of 
data packets (IP packets) sent by sources from the Internet to the mobile nodes. 

The other types are used in conjunction with the load balancing policies and 
are part of the improved multiple HAs scheme: registration request (RR) packets 
and table update (TU) notification packets (in the case of double threshold load 
balancing scheme). Only the data packets are further tunneled by the home 
agents to the destination mobile nodes. We assume that the data packets are 
identical in size, that is, the service time associated with them is unique. This 
assumption is fairly reasonable, as IP datagrams are usually split in fragments 
limited in size by the MTU. 

4.2 Description 

We describe the simulator according to 2 different approaches: structural and 
functional. 



Structure. Structurally, our simulator consists of several modules (units): 

— Workload generator unit; 

— Load balancing unit; 

— Queuing server management unit; 

— Results, analysis & confidence intervals unit. 

The data packets arrive at servers according to an event based traffic workload 
generator. As this generator was designed as a standalone module, different 
algorithms can be used within the same structure without major alterations. 
The algorithm currently used is based on a 2 state modulated Poisson process. 
This algorithm (used also by [1]) aggregates all traffic destined to a certain mobile 
node into one source that can be in an ON or OFF state. At the server (HA) 
side, packets arrive from one source with exponentially distributed inter-arrival 
times during an ON period, while during an OFF period the source is completely 
silent. The on/off periods are exponentially distributed with the rates ci and 
CT 2 . In our study we varied a\ and U 2 , while keeping constant the average packet 
arrival rate, in order to modify the traffic’s degree of burstiness. 
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The load balancing unit implements the load balancing policies described in 
Sect. 3. All the policies considered in our study are included, although only one 
type of policy (or pair of policies) is used during a simulation run. 

The queuing server management unit appropriately handles the incoming 
packets to each server (HA), according to their class. A packet first arrives at 
a server, then it requests access to its facility. If the access cannot be granted 
immediately, the packet is placed in the queue. Finally, after being executed, 
the packet is released. The server routines use SimH — h [10] simulation library 
functions. 

The main tasks of the results, analysis & confidence intervals unit are to 
compute the data packet mean response time, as well as other statistical in- 
formation, and to output the data to the result files. A batch means method 
implementation decides the endpoint of a simulation run according to chosen 
confidence coefficients. 

Function. Functionally, our simulator is discrete event based; it subsequently 
relies on event analysis and interpretation. The events are processed within the 
Event Dispatcher, an infinite loop that fetches events from a dedicated queue 
and triggers the appropriate routine. Each event has a time stamp that allows 
the placement in the queue according to its occurrence. The event types used in 
our simulator are showed in Table 1. 



Table 1. Event types 



Event Type 


Description 


ARRIVAL 


A packet has arrived at a home agent. After the initialization of 
performance counters, a “REQUEST_SERVER” event is sent 
to the event queue. 


REQUEST_SERVER A packet requests the HA server queuing facility. The request 
contains the packet ID as well as the required service time, 
according to the packet type. 


RELEASE_SERVER A packet has finished service at an HA server queue and several 
actions need to be taken, such as scheduling of a new packet 
arrival, computation of the packet response time etc. 


BEGIN_BURST 


Its occurrence has as effect the turning ON of a certain source. 
A source in an ON state sends packets with exponentially dis- 
tributed inter-arrival times. 


END_BURST 


Similarly, it turns OFF a certain source. In an OFF state, a 
source becomes silent (sends no packets). 


CHANGE_HA 


Event called to trigger by a binding switch from one HA to 
another as a result of a load balancing action. 


UPDATE.TABLE 


Event activated by a HA in order to send a table update packet 
to the other home agents in the home network. 
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4.3 Protocol Considerations 

When an MN registers with the HA from a new location, it selects one of the 
HAs and sends the registration request. If the MN needs to be relocated from 
an HAj to a different HAj due to a load balancing action, a similar registration 
request is sent by HAi to HAj, having the destination IP field filled up with the 
IP address of HAj . 

In the protocol proposed by [1], there was an additional table maintained at 
each HA in order to ensure that only one HA can act as a home agent for a 
certain mobile node at any moment. 

A special field inside the table insured that a certain MN is registered to only 
one HA at a time. In order to keep the information up-to-date, every time a new 
registration request was processed by an HA, it also sent a copy to the other 
HAs, so that the new record was consistently added in their binding caches. 

This way, however, copies of the registration request are distributed among 
the HAs every time a mobile node re-registers with a home agent because its 
previous registration lifetime is about to expire. Therefore, if the amount of 
lifetime negotiated during the registration procedure is small enough, there could 
be a large number of copies of registration requests being re-sent among the HAs, 
thus causing additional overhead and degradation of the overall performance. 

In our simulation study, a home agent (HA,) receives a registration request 
packet only in one of these 2 cases: 1) a mobile node sends an RR packet directly 
to the HA; 2) another HA sends an RR packet as a result of a load balancing 
action. In the current stage, for the sake of simplicity, we do not take account 
of the re-RR packets sent as a result of lifetime expiration. In other words, we 
consider that all the registrations requests are granted a lifetime value of 65,535, 
which corresponds to infinity. However, a possible way to avoid multicasting 
copies of re-RR packets to all HAs in the home network could rely on a “most 
recently moved” (MRM) cache maintained by each HA. An HAi keeping such 
a table could update a list with all the bindings that have been load balanced 
to other HAs. An entry in the respective cache would include the identification 
code and the HA to which a mobile node has been relocated. An entry can be 
removed in one of the cases: 1) a given timeout occurs; 2) the MN has sent a 
re-RR to HA,, and the request has been forwarded to the HA (HAj) that appears 
to hold the binding (according to the HAi’s MRM entry). If HAj does not hold 
the binding (because it has forwarded it to another HA^), it can forward the re- 
RR packet to HA^ according to the same mechanism. Using such a mechanism, 
a re-RR packet is sent to only one HA in most of the cases, avoiding potential 
traffic overhead. 

5 Results 

The results presented within this section show the mean packet response time 
against incremental values of the policy counter. This counter is defined according 
to each load balancing policy profile. 
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In all the figures presented, the mean response time is normalized to the 
mean data packet service time. For all the simulation experiments, a confidence 
intervals method (batch means) was used. The batch size is 20000, the number 
of initially deleted observations is 2000, and the accuracy coefficient is set to 
10 %. 

We adopt the following notation: 

N The number of home agents in the system; 

S The number of transmitting sources (equal to the number of 

mobile nodes because each source represents the aggregated 
traffic for one mobile node); 

(7i The rate a source turns ON with; 

(72 The rate a source turns OFF with; 

((Ti and (72 are exponentially distributed); 

A The arrival rate of packets coming from one source in an ON 

state; 

/X The service rate of each Home Agent; 

Xm The data packet mean arrival rate per Home Agent; 

SSD Source Switching Delay, in units used by each dynamic load 

balancing policy to expire the transfer counter; 

R The mean response time obtained in the case of a dynamic 

load balancing policy; 

Rs The mean response time obtained in the case of the static load 

balancing policy. 

5.1 Timer-Based Policy 

In Fig. 3 we plot the data packet mean response time ratio against the timer 
delay. 

Each dot represents a full simulation run using timer values between 1 and 
100 seconds. Although the source traffic pattern has been altered in order to pro- 
vide different degrees of burstiness, the mean arrival rate has been kept constant 
by varying A accordingly. 

It can be noticed that the non-bursty case provides better performance that 
the bursty case. Both curves initially descend sharply, due to the additional 
registration packets sent every time a source transfer is initiated. In Table 2 
we present the number of transfers that occur in both cases. The values are 
comparable because the transfers take place at the same intervals, regardless of 
the number of packets in the queues. 

Based on the mean response time of the data packets, we compute the per- 
formance gain of the considered load balancing policy with respect to a static 
policy (the sources are evenly divided among the home agents). In Fig. 4, we 
plot the performance gain against SSD for both bursty and non-bursty cases. 
The performance gain is defined as: 

PG = 100 



A. 



( 1 ) 
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Timer Policy - 4 sources, 8 servers; comparison between the non-bursty and bursty traffic cases 




Fig. 3. Timer policy. A^=4, 5'=8. Non-bursty case: cy^ ^0, A=l, /i=2.5. Bursty 

case: crj~2=90, 10, A=5, /i=2.5. The data packet mean response time, normalized by /i, 
is plotted against SSD 



Table 2. Registration Request Packets 

1 2 3 30 40 90 

Non-bursty 2304227 1522397 1094424 123513 93944 40457 
Bursty 2380060 1493361 1109977 126617 93998 42315 



As Fig. 3 and Fig. 4 show, the performance gain is significantly higher in 
the case of a bursty traffic shape. It has positive values in the [Os. . . 30s] range, 
which means that the timer-based policy behaves better than the static policy 
within this particular SSD range. For larger values of SSD, the static policy 
performs better than the timer policy. In fact, for all the load balancing policies 
tested, the burstier the traffic, the higher the performance gain was obtained. 
Therefore, we present in all subsequent cases results obtained using relatively 
bursty traffic shapes. 

5.2 The Double Threshold Policy and System Dimensioning 

In a real implementation of a mobile IP based network (for instance, ISPs 
adding mobility support to their list of service products), the size of a mul- 
tiple home agents system could need adjustment in order to obtain an optimal 
price/performance ratio. In other words, for a certain traffic shape, it could be 
useful to find the number of home agents that would provide optimal perfor- 
mance within some budget level. In the following, we present the performance 
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Timer vs Static - 4 sources, 8 servers; performance gains for non-bursty and bursty cases 




Source Switching Delay [s] 



Fig. 4. Timer policy. A^=4, 5'=8. Non-bursty case: 50, A=l, /i=2.5. Bursty 

case: 10, A=5, /i=2.5. The performance gain is plotted against SSD 



curves for several system configurations. We do not provide, however, a cost 
estimation of such environments. 

In Fig. 5, we plot the results obtained using the double threshold policy in the 
case of 7V=4 HAs (servers) and S'=64 sources. For comparison purposes we also 
show the behaviour of the timer-based and static policies for similar parameters. 

We can see that the double threshold policy generally outperforms the timer- 
based policy. Both double threshold and timer-based policies have a descending 
allure in the beginning. For small values of SSD (time counter in this case), the 
timer policy transfers sources at a high rate, each source every SSD seconds, 
regardless of the HA queue size. In the case of the double threshold policy, a low 
SSD (here equivalent to LT) determines a quick change of all the HA queues into 
“normal” and then “full” states, because an average of 16 sources/HA should 
rapidly pile up data packets into each HA queue. Thus, all the queues are in 
states other than “empty” , hence no (or a small number of) table update packets 
are forwarded among the agents, so less overhead is produced. Less changes 
in queue states determine less transfers and less registration packets, but also 
less benefit from load balancing. However, in this particular case {S big and N 
small), even under bursty traffic conditions, the sources assigned to one HA tend 
to compensate each other in such way that, in overall, the incoming data packets 
are fairly well balanced among the HAs. As the double threshold policy takes 
account of the global HA queue (the sum of packets coming from all the sources 
assigned to the respective HA), at first, it performs better than the timer policy, 
which merely works on a per source basis, regardless of the total queue size. The 
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Double Threshold vs. Timer Static - 4 servers, 64 sources; normalized mean response times for a highly bursty traffic 




Source Switching Delay [packets/s] 



Fig. 5. Double Threshold policy vs. timer & static policies. N~A, S'=64, 2=97.5, 

2.5, A=5, p=10 



static policy, however, initially outperforms the double threshold policy because 
it produces no overhead in a relatively balanced system. 

The same trend can be noticed in Fig. 6 and Fig. 7. In the latter case, how- 
ever, the timer-based policy performs initially better than the double threshold 
policy. This happens because of the increasing number of table update packets, 
compared to the previous cases of 4, 8 servers. As the number of servers becomes 
higher, a significant increase of the number of update packets is expected. As 
the LT level increases, the benefits of load balancing begin to compensate (as 
the number of overhead packets decreases, too), reaching a compromise, so that 
from SSD= 40 the performance curves flatten close to a static behaviour. 

The above-explained phenomenon can be visualized in Fig. 8, where the 
number of table update packets sent in the system per total simulation time is 
shown. While this number is considerably high for low values of SSD, later its 
value drops dramatically. This can be explained by the fact that for higher values 
of LT level, the queue size of a home agent seldom becomes large enough to reach 
LT and cause a change of state, triggering a table update message. This provides 
the explanation for the double threshold policy behaving much like the static 
policy for larger values of SSD. Indeed, as the number of table update packets 
decreases to zero, less and less changes of queue state occur, coming to the point 
when no HA reaches the “full” state. At that point, the double threshold policy 
behaves like a static policy because no source transfer is further being made. 

A comparative analysis of Fig. 5, 6, and 7 shows that the double threshold 
policy performs better than the timer policy, but worse than (or, at best, the 
same as) the static policy. In Fig. 9 we plot the performance gain of the double 
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Double Threshold vs. Timer Static - 8 servers, 64 sources; normalized mean response times for a highly bursty traffic 




Fig. 6. Double Threshold policy vs. timer & static policies. N=8, 2=97-5) 

2.5, A=5, 



Double Threshold vs. Timer Static - 16 servers, 64 sources; normalized mean response times for a highly bursty traffic 




Fig. 7. Double Threshold policy vs. timer & static policies. A^=16, 5=64, 1=97.5 

2.5, A=5, ^1=10 
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Double Threshold - the Table Update Packets for 4, 8 and 16 servers 




Lower Threshold Level [packets] 



Fig. 8. Double threshold policy. A^=4,8,16, 5=64, 97.5, 2.5, A=5, /i=10. The 

total table update packets in the system is plotted against SSD - in this case, LT 



threshold policy over the static policy, for A^=4, 8, 16. The steeply ascending 
slopes in the beginning show that, at very small values of SSD (in this case, 
LT), the double threshold policy saturates the HAs with table update packets 
(a seven figure number for N=4). 

Both Fig. 5 and Fig. 9 could indicate an interval of maximum performance 
of the double threshold policy, for SSD values within [8. ..25], although the 
variations are within the 10% confidence intervals accuracy. 

Based on the simulation data collected so far, we cannot yet isolate the 
conditions under which the double threshold policy performs point-blank better 
than the static policy. There are several possible reasons for this: 

— In its current implementation, the double threshold policy produces more 
overhead than benefit; 

— The traffic pattern is not adequately chosen for this purpose; 

— Other simulation parameters need further adjustment (there are in total 7 
independent parameters for the double threshold policy). 



6 Conclusions and Future Work 

We have developed a simulator by which we studied the performance of a mul- 
tiple home agents architecture in a mobile IP network. Our simulator could be 
used with little modification in any multiple agents scheme based on dynamic 
mobile host allocation using additional overhead registration messages. 
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Double threshold vs. static - 4, 8, 1 6 servers, 64 sources; performance gain under a highly bursty traffic 




Source Switching Delay [packets] 



Fig. 9. Double threshold policy vs static policy. A^=4,8,16, 2=97.5, 2.5, A=5, 

/i=10. The performance gain is plotted against SSD (LT) 



The results showed that the overall performance is highly dependent on the 
traffic patterns. A relatively bursty traffic generally triggered a better perfor- 
mance gain than a non-bursty one. Some load balancing policies expected to 
perform better than others occasionally had an opposite behaviour. This proved 
out to be highly dependent on the traffic pattern, particularly on the stochastic 
component. 

We also studied comparatively the behaviour of several load balancing poli- 
cies and introduced a more realistic customized double threshold load balancing 
policy. 

Both double threshold and static policies showed better performance com- 
pared to the timer policy. The load balancing effect of the double threshold 
policy was significantly limited by the overheading registration request and ta- 
ble update packets, for low values of SSD. 

We showed the system behaviour for different numbers of home agents, keep- 
ing all the other parameters constant. This could be useful for finding an optimal 
price/performance ratio for future large scale mobile IP implementations (for ex- 
ample, ISP). 

A first step of our future work will be extending the model in order to provide 
appropriate behaviour in the case of receiving re-registration request messages, 
as described in Sect. 4.3. 

Secondly, we are considering the possibility of an extension to the case of 
a hierarchical foreign agents architecture that could be used, for instance, in- 
side large ISP domain areas. According to the model proposed in [13] (work in 
progress), the root of the hierarchy structure, denoted as GFA (Gateway Foreign 
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Agent), has special attributions that include manipulating and directing all the 

traffic to the other foreign agents in the structure, therefore becoming subject 

to potential overloading and crash. 
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